Exploring Signature-Based Model Calibration for Streamflow Prediction in Ungauged Basins

Calibration of precipitation-streamflow models to streamflow signatures is a promising approach for streamflow prediction in ungauged basins (PUB). The estimation of parameter and prediction uncertainty in this case is not trivial because: (a) calibration takes place in the signature domain, while predictions are required in the time domain, and (b) streamflow signatures are estimated (e.g., from donor catchments) rather than “observed” (computed from observed streamflow in the target catchment), and therefore particularly uncertain. This study investigates model calibration using estimated signatures in an Approximate Bayesian Computation framework. First, we construct a stochastic signature transfer model, based on seasonal flow duration curves. Second, we calibrate a precipitation-streamflow model to the estimated signatures, accounting for their uncertainty. The proposed method is tested in six catchments of the Thur basin, Switzerland. Three data availability scenarios are considered: (a) concomitant scenario, where signatures are “observed,” (b) non-concomitant scenario, where signatures are transferred from a different time period, and (c) regionalization scenario, where signatures are transferred from (neighboring) donor catchments. In this study, the switch from observed to regionalized signatures increases predictive streamflow uncertainty by 38% and worsens the (deterministic) fit to observations by 17% (in terms of Nash-Sutcliffe efficiency). Despite this deterioration, posterior predictive uncertainty remains lower than prior predictive uncertainty generated using uniform priors over representative parameter ranges (“uncalibrated” model), which demonstrates the effectiveness of the proposed signature-based calibration. More importantly, uncertainty is reliably estimated at the ungauged catchments, which represents a key advance in stochastic streamflow PUB.

1. Non-overlapping time series, where observed inputs (e.g., precipitation) and observed outputs (e.g., streamflow) are available in the target catchment but in different time periods.This scenario, which we refer to as "non-concomitant calibration" (NCC), represents arguably the simplest ungauged scenario.In this scenario, the signatures are "transferred" (extrapolated) in time, in the simplest case by assuming signatures are constant in time.NCC is relevant to catchments where old streamflow time series are available without corresponding precipitation records (e.g., Winsemius et al., 2009), as well as to catchments where streamflow characteristics have been heavily influenced by recent hydraulic infrastructures (e.g., Hingray et al., 2010).NCC was explored by Montanari and Toth (2007), who proposed the use of a likelihood based on the spectral properties of the time series.2. Streamflow data available from donor catchments.In this scenario, streamflow signatures in the target catchment are estimated by the modeler using data from donor catchments, that is, the signatures are transferred (extrapolated) in space, and possibly also in time, depending on the setup.These approaches are typically based on regression between signatures and catchment attributes such as climatic and landscape properties (e.g., Addor et al., 2018;Berger & Entekhabi, 2001;Castiglioni et al., 2009;Prieto et al., 2019;Yadav et al., 2007).3. Streamflow data not available.This scenario is even more restrictive than scenario 2, and requires the regression relating signatures and catchment attributes to be constructed offline as a standalone model, so that, in the given application, the modeler need not find donor catchments.A notable example of such models was developed by Botter et al. (2009) and used later by Doulatyari et al. (2015).This model expresses the seasonal flow regime as a function of four physically based parameters that embed the geomorphic and climate features DAL MOLIN ET AL. 10.1029/2022WR031929 3 of 32 of the contributing catchment.Similar signature models were developed by Booker and Woods (2014) and Betterle et al. (2017).
An important challenge associated with data availability and model approximations is the treatment of streamflow predictive uncertainty.Predictive uncertainty quantification when the model is calibrated to estimated signatures is particularly challenging.First, calibration takes place in the signature domain, while predictions are necessary in the time domain.Second, streamflow signatures are extrapolated from other catchments rather than calculated directly from local observed streamflow data, and hence may be particularly uncertain.
Uncertainty estimation in the context of PUB has relied mainly on heuristic "limits of acceptability" approaches.For example, Westerberg et al. (2011) searched for precipitation-streamflow model parameters that produce FDCs similar to the observed FDCs at selected quantiles.Winsemius et al. (2009) used observed signatures to impose hard and soft constraints to select suitable parameter sets.Yadav et al. (2007) used regionalized signatures (RS) and accepted only model parameters that produce signatures that fall into an a priori estimated range of variability around the RS.A limitation of these approaches from our perspective in this study is that they yield predictive uncertainty estimates that are not interpretable in a statistical sense.
Although the Bayesian approach could in principle address these limitations, its applications to-date have not focused on the quantification of the different sources of streamflow uncertainty in PUB.For example, previous studies have explored methodologies for conditioning model parameters on RS (e.g., Bulygina et al., 2009Bulygina et al., , 2011;;Castiglioni et al., 2010;Prieto et al., 2019) but have considered mainly the uncertainty related to the regionalization process and neglected other sources (e.g., uncertainly related to the precipitation-streamflow model); usually, these studies weigh model parameter sets based on their ability to produce simulated signatures that are close to the regionalized ones and, then, report the parametric predictive uncertainty of the simulated hydrographs.Other studies (e.g., Almeida et al., 2016) report general model performance (e.g., Nash-Sutcliffe efficiency (NSE)) and its ability to represent the signatures without attempting to estimate predictive uncertainty.
The estimation of uncertainty in the time domain while performing calibration in the signature domain can be pursued using Approximate Bayesian Computation (ABC), as demonstrated in previous publications (e.g., Fenicia et al., 2018;Kavetski et al., 2018;Nott et al., 2014;Vrugt & Sadegh, 2013).Efficient implementations of the ABC approach have been proposed, including the SABC algorithm (Albert et al., 2015), DREAM (ABC) (Sadegh & Vrugt, 2014), and others.However, these previous studies focused on scenarios where signatures are computed directly from available streamflow observations, which is not the case for ungauged catchments.
In this study, we focus on streamflow prediction and uncertainty estimation at ungauged locations.In order to calibrate a precipitation-streamflow model to uncertain signatures, we propose a Bayesian inference framework implemented using ABC.
Our aims are as follows: 1. Introduce a novel Bayesian approach for the calibration of precipitation-streamflow models to signatures that are estimated, with substantial uncertainty, using a signature transfer model.This advance builds on previous work where the signatures were computed directly from observed streamflow in the catchment of interest.2. Assess the ability of the proposed signature-domain calibration to provide reliable and precise predictions, in the following data availability scenarios: a) Calibration to concomitant signatures (CS), that is, using observed FDCs.This scenario serves as a baseline; b) Calibration to non-concomitant signatures (NCS), that is, extrapolating the FDCs from another time period; c) Calibration to RS, that is, extrapolating the FDCs from neighboring donor catchments.3. Assess the performance of the proposed signature-domain calibration in broader contexts, including in comparison to: a) Classical time-domain calibration, in order to appraise potential loss of quality in model predictions; b) Prediction using prior parameter ranges, in order to appraise the extent to which calibration to RS is able to constrain model predictions.
The proposed approach provides separate treatment of two key sources of uncertainty that arise in the modeling process, namely: (a) signature transfer uncertainty, resulting from uncertainty in the data, structure and parameters of the signature transfer model and (b) hydrological model uncertainty, resulting from the data, structure and DAL MOLIN ET AL.

10.1029/2022WR031929
4 of 32 parameters of the precipitation-streamflow model.As part of the development, the SABC algorithm proposed in earlier work is generalized to accommodate the case of stochastic (rather than fixed) signatures.A key focus of our evaluation is on the (statistical) reliability of the predictions, that is, on the ability to estimate the magnitude of uncertainty and characterize its distributional properties.
The case study is based on data from six catchments of the Thur basin in Switzerland, which are used to simulate multiple ungauged scenarios.A lumped conceptual precipitation-streamflow model is applied separately in each of the catchments.Our choice of signatures is based on seasonal FDCs.The signature model is based on Doulatyari et al. (2017), but here we enhance it with its own error model (i.e., the signature model is enhanced from deterministic to stochastic), in order to represent uncertainty in the signature regionalization.
The paper is organized as follows.Section 2 presents the theory behind the proposed methodology and Section 3 details the algorithm implementation.Section 4 describes the case study setup, Section 5 reports the case study results, and Section 6 discusses these results and their implications.Section 7 draws the conclusions.

General Overview
Figure 1 shows a schematic of the signature-based inference framework proposed in this study.The following scenarios are central to this study: • CS, where streamflow observations are available at the location and time period of interest.The signatures are calculated directly from this observed data.This "gauged catchment" scenario provides the baseline for the other two scenarios.• NCS, where streamflow observations are available at the location of interest but not in the time period of interest.This scenario corresponds to data availability scenario 1 in Section 1.The signatures in the time period of interest are transferred in time using a stochastic signature model, based on data from the available time period and a random error model to describe associated uncertainty.• RS, where streamflow observations at the location of interest are not available.The signatures at the location of interest are transferred in space using a stochastic signature model and data from donor catchments, once again with the inclusion of a random error model to describe uncertainty.This scenario corresponds to data availability scenario 2 in Section 1.
In Figure 1, panel (a) describes scenario CS, where signatures are observed, and panel (b) describes the scenarios NCS and RS, where in both cases, the signatures are estimated.We use the term target signatures to indicate, regardless of the scenario considered, the signatures that the precipitation-streamflow model has to "match" during the calibration.The method used to estimate the target signatures varies depending on the scenario.
In scenario CS (panel a), the target signatures  ỹ are calculated directly from observed streamflow data  q hence , where g is a (vector-valued) deterministic function that transforms a streamflow time series into a vector of target signatures (e.g., FDC quantiles, baseflow index, or other signatures at the catchment of interest) (Section 2.3.1).The parameters θ (H) of the precipitation-streamflow model H that generates streamflow predictions Q (H) are inferred in the signature domain, by seeking parameter values that minimize the distance between the modeled signatures   (H)  prior =  (  (H) ) and the observed signatures  ỹ = ( q) .The resulting set of samples from the posterior distribution of the parameters, (H)  posterior can then be used to generate posterior streamflow distributions   (H)  posterior .
In scenarios NCS and RS (panel b), a signature transfer model Y (T) (θ (T) ) is employed to estimate the target signatures (refer to Sections 2.3.2 and 2.3.3).The signature model generates the same signatures as the function g but with two caveats: (a) it takes a different set of inputs when streamflow time series are not available and (b) a random error term is included to represent estimation error.The inference process is carried out in two sequential steps.In the first step (shown in the red dashed box), the signature model is calibrated (using data from a different time period in Scenario NCS and data from donor catchments in Scenario RS) and then used to estimate the target signatures   (T)  posterior at the catchment of interest.In the second step the precipitation-streamflow model is calibrated to the estimated signatures   (T) posterior , generated using the pre-calibrated signature model.This DAL MOLIN ET AL. 10.1029/2022WR031929 5 of 32 step produces samples (H)  posterior from the posterior distribution of the parameters of the precipitation-streamflow model, and in turn, samples   (H)  posterior from the posterior (predictive) distribution of streamflow time series.A key distinction of scenarios NCS and RS from scenario CS is that the target signatures form a distribution rather than a fixed value-a difference with important theoretical and algorithmic implications, as explained in Section 3.
In addition to the three main scenarios, the case study considers two auxiliary scenarios, namely concomitant hydrograph (CH) and prior simulation (PS).Scenario CH employs classical time-domain calibration where the parameters of the precipitation-streamflow model are inferred using observed streamflow time series.Scenario PS simulates streamflow using model parameters sampled from their prior distribution (i.e., without any "inference"); note that in this scenario residual errors are ignored, leaving (prior) parameter uncertainty of the precipitation-streamflow model as the sole source of predictive uncertainty.Scenarios CH and PS represent, respectively, the most and least constrained parameter estimation setups, and provide further context for the analysis, as described in Section 4.6.
The procedures depicted in Figure 1 are detailed next.

Precipitation-Streamflow Model
The predictive distribution   (H) = {  (H)   ;  = 1, . . ., T } , over N T time steps is generated by a stochastic precipitation-streamflow model.Typical stochastic models in hydrological modeling are obtained by combining a deterministic model m (H) with a random residual error term ε (H) .In this prototypical example, the stochastic model equation can be written as where z is a transformation (e.g., Box-Cox) that accounts for the heteroscedasticity and skew in the residuals (e.g., McInerney et al., 2017); its parameters are denoted as (H)  z .The precipitation-streamflow model parameters are denoted as θ (H) , and in turn are composed by parameters of the deterministic model (H)  m , parameters of the residual error model (H)  ε , and transformation parameters (H)  z .The term  x (H) denotes all fixed inputs of the model, such as observed precipitation, potential evaporation, etc.
The derivation below is presented for general stochastic models and transformations.Specific modeling choices taken in the case studies are detailed in Section 4.2.

Streamflow Signatures
Consider a set of signatures y, defined from the underlying streamflow time series q through the deterministic function g, such that y = g(q).The choice of signatures represents a modeling choice analogous to the choice of calibration variables.Our choice of signatures for the case study is given by selected quantiles of seasonal FDCs, as detailed in Section 4.3.For consistency, the choice of signatures is kept constant in all scenarios (though their value will naturally change depending on the estimation method).

Scenario "Concomitant Signatures" (CS)
In scenario CS, the signatures are computed directly from observed streamflow time series in the calibration period and the target catchment, where g is used to transform observed streamflow  q into "observed" signatures  ỹ .

Scenario "Non-Concomitant Signatures" (NCS)
In scenario NCS, the signatures are derived from observed streamflow time series  qtr in the target catchment but in a different time period (with respect to rainfall forcing).The signature transfer model, which combines a deterministic term and an additive random error term to describe uncertainty, is given by: where Y (T) are the estimated signatures, (T)  ε are the parameters of the signature transfer error model ε (T) , and θ (T) comprises (T)  z and (T) ε .
The choice of transformation z and error model ε (T) in Equation 3 are application-specific; our choices for the case study are detailed in Section 4.4.2.

Scenario "Regionalized Signatures" (RS)
In scenario RS, the streamflow signatures are obtained using a stochastic signature model.For simplicity, we again assume that the stochastic model comprises a deterministic term m (T) and an additive random error term ε (T) to describe its uncertainty.
The stochastic signature model is given by  T) denotes all fixed inputs of the signature model, for example, observed precipitation statistics and landscape characteristics of the target catchment.The model parameters θ (T) comprise (T)  m , (T) z , and (T) ε .
The specific choices of model m (T) , transformation z, and residual error model ε (T) made in our case studies are detailed in Section 4.4.3.
Note that the deterministic term m (T) estimates the signatures without the use of streamflow (which is unavailable in the donor catchment).Hence the function g does not appear in Equation 4.

Bayesian Inference Approach
The posterior distribution of precipitation-streamflow model parameters, |̃ , ̃ , is given by Bayes equation as follows, where ̃ | (H) , ̃ is the likelihood function, p(θ (H) ) is the prior distribution of the model parameters, and  (ỹ|x) is a normalization constant (given by the marginal distribution of the observed signatures).
In scenario CS, where the target signatures are deterministic, Equation 5 can be used directly to estimate the posterior of θ (H) ; the algorithmic implementation is described in Section 3.
In contrast, in scenarios NCS and RS, where the target signatures are themselves explicitly formulated as the output of a stochastic model, Equation 5 cannot be applied directly.
To accommodate scenarios NCS and RS, we formulate the posterior distribution in "expanded" form using the total probability integral, where |̃ , ̃ and | , ̃ are the posterior distributions of the precipitation-streamflow model parameters θ (H) with respect to the observed signatures  ỹ and the simulated signatures z, respectively.The term  (| ỹ, x) represents the distribution of modeled (transferred) signatures in the target catchment given signatures  ỹ observed in donor catchments.
Expanding the first term in Equation 6 using Bayes equation yields The three scenarios differ in the choice of the term  (|x, ỹ) in Equation 7. In scenario CS, the uncertainty in observed signatures is treated as part of the "lumped" error term ε (H) already included in the precipitation-streamflow model in Equation 1.In this case, the probability distribution  p(| ỹ, x) collapses to a Dirac function and the integral in Equation 7 simplifies to the usual Bayesian formulation in Equation 5, In scenarios NCS and RS, this simplification cannot be made because the uncertainty of the estimated signatures is not part of the "lumped" error term ε (H) .Therefore, the term  (|ỹ, x) represents the probability density function of the stochastic signature model Y (T) (denoted in full by (T) | ̂ (T) , ̃ , ̃ in Equation 9 below), with parameters ̂ (T) pre-calibrated offline to complementary data (refer to Sections 2.3.2 and 2.3.3).Equation 7 The posterior parameter distribution in Equations 8 and 9 is sampled using an ABC procedure as described next in Section 3.

Sampling Algorithm to Implement Parameter Inference: Modified SABC Algorithm
This section describes the sampling algorithm used to estimate the posterior distribution of parameters θ (H) of the precipitation-streamflow model.Compared to classical time-domain calibration, signature-domain calibration brings the complication that the likelihood function is not available in closed form.Moreover, a signature itself can be treated either as a fixed given value (scenario CS) or as a random variable to represent estimation uncertainty (scenarios NCS and RS).
The ABC sampling algorithm used in this work is based on the SABC algorithm proposed in Albert et al. (2015), but employs a modification to take into account the use of stochastic (rather than fixed) target signatures.
Figure 2 illustrates the algorithm.Panel (a) shows the workflow applied in the case of deterministic signatures (scenario CS); for this scenario, the procedure is the same as in the earlier study by Kavetski et al. (2018) and Fenicia et al. (2018).Panel (b) shows the algorithm modifications developed to handle stochastic signatures (scenarios NCS and RS).
In both cases, the algorithm is composed by three separate stages: 1. Calculation of the target signatures (green box).This step is carried out using observed streamflow data in scenario CS and using the signature models in scenarios NCS and RS; DAL MOLIN ET AL. 10.1029/2022WR031929 9 of 32 2. Determination of an initial population of N sam model parameters (H) , which is obtained by sampling parameter sets from the prior distribution and retaining parameter sets where the distance ρ between simulated and target signatures is below a prescribed tolerance (red box).Note that this procedure inevitably yields different initial populations for different scenarios (due to differences in signature values, and hence in distance metric values and in the satisfaction of the prescribed tolerance); 3. Evolution of the population of particles (H) using a Metropolis step with the acceptance/rejection tolerance evolving according to the annealing schedule described in Albert et al. (2015) (blue box).
Intuitively, the SABC algorithm works as follows.We seek to generate samples (parameter sets and corresponding streamflow time series and signatures) that meet a signature distance tolerance that is as tight as possible (ideally zero though this is impossible in practice).To obtain such samples, Stage 2 generates an initial population of samples that meet a loose tolerance and then Stage 3 progressively "evolves" these samples toward values that meet tighter tolerances.For the overall algorithm to be computationally viable, the tolerance in Stage 2 must be sufficiently loose that the initial population can be generated in a reasonable time.As the SABC algorithm progresses toward convergence, the tolerance is tightened and the influence of the distance metric on the parameter inference decreases (see Albert et al. (2015) and Kavetski et al. (2018) for details).
A detailed description of the standard SABC algorithm shown in panel (a) is provided in Albert et al. (2015); see also Kavetski et al. (2018).The presentation here focuses on the modifications designed in this work to adapt the SABC algorithm to the case of stochastic target signatures (panel b).Further technical details and specific algorithmic choices are provided in Section 4.
The key difference is in the computation of the target signatures.In scenario CS (panel a, green box), a single fixed set of signatures is calculated from observed streamflow data.In scenarios NCS and RS (panel b, green box), a population of N sam sets of target signatures is sampled from the pre-calibrated signature model.This step is indicated with a gray background in Figure 2.This population is kept fixed throughout the SABC computation.Each particle (H) is associated to one of these target signature values  ỹ , and is evolved by the SABC algorithm to minimize the distance of its corresponding simulated signature set to this target signature set.This modification allows for the incorporation of the uncertainty of the stochastic signatures into the SABC sampling algorithm.
Once sampling is complete, the predictive streamflow distribution can be generated for any period of interest.See Appendix A for the computational procedure.

Catchments
The proposed inference approach is tested in the Thur basin, which is an alpine and pre-alpine catchment located in north-eastern Switzerland, south of Lake Constance.Within this basin we selected six catchments, as shown in Figure 3.
Although some of the catchments are nested, they present substantial variability in streamflow signatures including average streamflow (1.64-4.14mm/d), baseflow index (0.42-0.57) and seasonality (the mean half streamflow date varying between 168 and 221 days), as shown in previous work (Dal Molin, Schirmer, et al., 2020).
The catchment has high quality observation data, having been studied intensively in the last 40 years.Detailed physical characteristics of the Thur basin and a summary of previous investigations can be found in Dal Molin, Schirmer, et al. (2020).The work of Doulatyari et al. (2017) on the estimation of FDCs is particularly relevant to this case study, as detailed in Section 4.4.3.All the time series used in this study are daily and span the period 1981-2005, with only one gap in the streamflow data in Herisau, from 31 December 1982 to 9 May 1983.The 24 years of data are divided into three periods of 8 years, which are used to create "virtual" ungauged scenarios (see Section 4.5 for details).

Precipitation-Streamflow Model
The deterministic precipitation-streamflow model m (H) used in Equation 1 is based on the popular HyMod model (Boyle, 2003), with the additional inclusion of a snow reservoir.This model provides reasonable fits to the observed data in the Thur basin (Dal Molin, Schirmer, et al., 2020).Importantly, as a lumped model, it has low computational and data requirements.
The model has four elements, shown schematically in Figure 4.The snow reservoir (WR) intercepts the incoming precipitation and releases it according to the input temperature, in order to simulate snow accumulation and melting.The unsaturated reservoir (UR) partitions the combined precipitation and snowmelt into a flux that builds storage, which eventually evaporates, and a flux that is propagated to the downstream elements and eventually produces streamflow.The latter flux is partitioned between a cascade of three fast reservoirs (FR) and a slow reservoir (SR).The FR are intended to generate the peaks of the hydrograph and their offset, while the SR is intended to produce the baseflow.
The model is implemented using the SUPERFLEX modeling framework (Dal Molin, Kavetski, & Fenicia, 2020;Fenicia et al., 2011).A fixed step implicit Euler time stepping scheme is used for numerical stability.The equations and the calibrated parameters are detailed in Appendix B.
The streamflow residual error model is built to take into account heteroscedasticity, skew and autocorrelation.The Box-Cox transformation (Box & Cox, 1964), is used in Equation 1.The power parameter λ is fixed to 0.2 (McInerney et al., 2017).
The autocorrelation is represented using a first-order autoregressive model (AR1), where   (H)   is the residual error at the time step t, ϕ (H) is the autoregressive parameter and   (H)    is the innovation (random noise term).
The innovations are assumed to follow a lower-truncated Gaussian distribution, with zero mean, variance   (H) W , and lower bound L W,t set such that    (H) ≥ 0 .

Definition of Streamflow Signatures
The streamflow signatures are derived from the seasonal FDCs.To avoid ambiguity, we define a seasonal FDC, ψ s (q), as the cumulative distribution function (CDF) of a streamflow time series in season s, where Q s is a random variable representing seasonal streamflow.
Simulated FDCs are calculated directly from simulated streamflow time series.Target FDCs are either calculated from an observed streamflow time series (scenario CS) or estimated using a model (scenarios NCS and RS).For example, in scenario CS, the streamflow time series q is divided in four seasonal partitions q s , where s = 1,…,N season is the season and N season = 4. Season 1 (autumn) is defined as September, October and November; Season 2 (winter) is defined as December, January and February; and so forth.
The signatures y are defined by the slopes between selected (consecutive) quantiles of the seasonal FDCs, where ψ −1 (ς) is the inverse FDC, that is, the streamflow at quantile ς.The "matrix subscript" notation [j, s] is used in Equation 14to refer to the jth signature in season s.
The quantiles ς j in Equation 14 are selected according to the following expression, which provides higher resolution of quantiles for higher streamflow (e.g., Westerberg et al., 2011), Following Westerberg et al. (2011), a total of N FDC = 19 slopes are calculated according to Equation 14: N FDC − 1 slopes between the quantiles in Equation 15, and additionally the slope between (0,0) and . The choice of N FDC provides a balance between the benefits of a high resolution of the FDC versus the increased computational costs.Note that the extremal quantiles 0 and 1, which correspond to the maximum and minimum streamflow, are excluded due to their particular volatility and uncertainty.
Since in other equations the signatures are assumed to be concatenated into a single vector y = {y i ;i = 1,…,N y }, the mapping y i = y [j,s] holds for s = floor(i/N FDC ) + 1 and j = i−(s − 1)N FDC , where floor is the integer rounding down function.
The use of FDC slopes instead of FDC quantiles, as shown in Equation 14, yields a better-behaved signature error model in scenarios NCS and RS.In particular, from an algorithmic perspective, monotonicity of the FDC is specified more easily by requiring the slopes to be positive rather than by imposing relational constraints between the quantiles.
Finally, note that the use of FDC slopes instead of FDC quantiles does not result in a loss of information, as there is a one-to-one correspondence between slopes and quantiles.The definition of the first slope, j = 1 in Equation 14 distinguishes "parallel" FDCs (with matching slopes everywhere except the first segment).
The distance metric is set to the maximum of the seasonally averaged distances between observed and simulated FDC slopes.A normalization is used to scale the slopes by their estimated variability.See Appendix C for details.

Scenario CS (Concomitant Signatures)
The signatures are computed as where the function g is as defined earlier in Section 2.3.

Scenario NCS (Non-Concomitant Signatures)
The model for NCS is formulated based on Equation 3 as follows, where  qtr is streamflow from the same catchment and season but different time period.The Box-Cox parameter is set as   (T) BC = 0.2 .The error standard deviation   (T)  ε is inferred in the same catchment in a period where streamflow is available (see Section 4.5).

Scenario RS (Regionalized Signatures)
The deterministic FDC model in this study is based on the model proposed by Botter et al. (2009) and subsequently applied in the Thur basin (Doulatyari et al., 2017).This parametric model defines the probability density function (pdf) of streamflow q, up to a constant, as follows, where parameters α and L represent, respectively, the mean precipitation and the frequency of effective precipitation events (i.e., precipitation events that generate streamflow); k and a represent the coefficient and the exponent of the hydrograph recession.
This model does not make use of streamflow data, and therefore can be used for cases where streamflow data is not available (data availability scenario 3 in Section 1).As described by Doulatyari et al. (2017), parameters α and L are estimated from daily rainfall and snowfall time series (the model uses precipitation without distinguishing its form).Parameters k and a are estimated from geomorphic properties of the catchment.
The model in Equation 18 is applied on a seasonal basis.We used the seasonal parameter values reported in Table 2, column "Modeled," of Doulatyari et al. (2017) for the Thur basin, where parameter λ corresponds to parameter L as defined in this presentation.
The deterministic model m (T) used in Equation 4 is obtained by numerical integration of the PDF in Equation 18with respect to q in order to calculate the corresponding CDF and hence the (seasonal) FDC, which in turn is used in Equation 14to calculate the signatures.
The specific choices regarding transformation z and error model ε (T) in Equation 4are motivated by residual analysis of model m (T) in the gauged catchments.Based on these analyses, it was observed that residuals are heteroscedastic, and characterized by a consistent difference (bias) between observed and simulated FDCs (e.g., Figure 4 in Doulatyari et al., 2017).Here, we chose to represent heteroscedasticity by making the standard deviation of the residuals proportional to the magnitude of simulated signatures, where   (T)  ε is a proportionality constant that can be interpreted loosely as a relative error.As a consequence of this parameterization, the transformation z is not needed.
The signature model is "bias-corrected" by applying a multiplier   (T)  ε to the modeled signatures.Equation 4 is therefore simplified to The inferred model parameters are (T) ε = (T) ε , (T)  ε .
In order to estimate the error term, we use data from neighboring catchments, meaning that the stochastic signature transfer model belongs to the data availability scenario 2 in Section 1.

Construction of Ungauged Scenarios for the Calibration of the Signature Models
In order to construct "virtual" ungauged scenarios, the 24 years-long period of streamflow data is divided into three periods: P1, from 1 September 1981 to 31 August 1989; P2, from 1 September 1989 to 31 August 1997; P3, from 1 September 1997 to 31 August 2005.
The following procedure is then applied for the three scenarios: • Scenario CS: when θ (H) are needed for a given catchment and a given period, the model is calibrated to observed signatures in that catchment and that period.• Scenario NCS: when θ (H) are needed in P1 for a given catchment, the signature transfer model in Equation 3uses inputs  qtr from P2, and is conditioned on signatures  ỹ observed in P3.We then rotate the three periods in order to run the inference for all the time series of a given catchment.We then apply this procedure to all catchments.
• Scenario RS: when θ (H) are needed in a specific catchment for a given period, the signature model in Equation 4 was conditioned on signatures  ỹ observed in the other five catchments.We then rotated the six catchments in order to run the inference in all catchments for a given period.We then applied this procedure to all periods.This procedure generates a set of calibrated parameters (T)  opt that is specific for each catchment and each time period.These parameters are kept fixed at their calibrated values when inferring θ (H) with the procedure described in Section 3. The model diagnostics metrics described in Section 4.7 are calculated on the entire 24 years-long period, by concatenating the predictions in periods P1, P2, and P3 into a single time series, and comparing this concatenated time series to the corresponding (single) observed time series.
The number of SABC samples (see Section 3) is set to N sam = 5,000 following our previous work (Fenicia et al., 2018), where it was found appropriate to characterize the posterior distribution of a HyMod-like model.Note that, with these settings, convergence of the SABC algorithm requires approximately 4 million hydrological model runs.
The CS scenario serves as a baseline of ideal performance given the selection of model and signatures.The NCS and RS scenarios use signature transfer in time and in space respectively, therefore they can be considered as validation scenarios.It would be in principle possible to create an even more challenging validation scenario where signatures are transferred both in time and in space, but we did not test this scenario in this work.

Experiments
Three experiments are carried out: • Experiment 1 compares model performance in the three signature-domain calibration scenarios (CS, NCS, and RS) in order to appraise potential loss of quality in model predictions when moving from gauged to ungauged conditions.• Experiment 2 compares signature-domain calibration in scenario CS with time-domain calibration (scenario CH), in order to appraise potential loss of quality in model predictions when moving from time domain calibration to signature domain calibration.• Experiment 3 compares model performance in scenario RS with scenario PS, where the prior predictive distribution is used, that is, hydrographs generated using parameters sampled from the prior distribution.This experiment helps assess the extent to which calibration to RS is informative beyond what is already known a priori.
Experiment 1 is the main experiment of this study.Experiments 2 and 3 are auxiliary experiments that represent, respectively, the most and least constrained parameter estimation setups, and provide additional context to Experiment 1.Note that, apart from the modifications described in Section 3, the SABC algorithmic settings were the same in all scenarios.
The results are reported using two levels of detail.First, we report the result for a representative catchment and time period, namely Andelfingen and time period P3 (see Section 4.5), where we provide details including hydrographs and annual FDCs.Second, we consider all catchments and time periods, and report performance using the metrics defined in Section 4.7 applied to the concatenated time series as described in Section 4.5.
DAL MOLIN ET AL. 10.1029/2022WR031929 14 of 32 For Experiment 1, we include a comparison of the posterior distribution of the model parameters in the three scenarios.Specifically, we illustrate cases of parameters being well identified or weakly identified in all scenarios and cases where parameter identifiability varies by scenario.

Performance Metrics for a Posteriori Evaluation of Streamflow Predictions
The quality of streamflow predictions in all experiments and scenarios is evaluated using a set of performance metrics.We report model performance in terms of both streamflow time series (hydrograph) and their FDC (expressed as a cumulative distribution).We distinguish metrics that characterize the fit to the data from metrics that characterize the uncertainty.
The fit of the simulations to the observations is quantified using: • NSE of the median of the predictive distribution.The NSE is chosen because it is a standard metric of model performance in hydrological applications; • Volumetric bias, which measures the long-term water balance error of simulations.
The uncertainty of the streamflow predictions is assessed through the following metrics: • Reliability, which measures the statistical consistency between the observations and the predictive distribution (i.e., the degree to which observations are consistent with being samples from the predictive distribution); • Precision, which measures the (average) spread of the predictive distribution.
The NSE and bias metrics are common in hydrology, and the uncertainty quantification metrics are also well established (e.g., Ehlers et al., 2019;McInerney et al., 2017;Oliveira et al., 2018).The metrics are detailed further in Appendix D.
To avoid confusion, we emphasize that the performance metrics in this section are used to evaluate model predictions a posteriori (after the calibration), and are not used in the model calibration itself.Model calibration is instead based on the ABC approach, which matches the selected signatures (FDC slopes) according to the ABC distance metric, as described in Section 3 (general principles) and Appendix C (specific case study choices).The hydrographs in panel (a) show a consistent difference in the width of the streamflow uncertainty bounds between scenario CS and the other two scenarios.Scenario CS has the narrowest predictive uncertainty, scenario RS comes second, followed closely by scenario NCS, where uncertainty bounds enclose the other two.This difference in (relative) magnitude of uncertainty is also confirmed by the precision metric of the hydrograph (panel d), which is the lowest in the scenario CS (0.42) and assumes comparable values in the other two scenarios (0.57 for scenario NCS and 0.56 for scenario RS).The predictions are generally reliable, with best values of the reliability metric in scenario CS (0.049), followed by scenario RS (0.12) and then scenario NCS (0.13).

Representative Catchment: Hydrograph and FDC Representation
Panel (a) also shows that the median simulated hydrograph (dashed lines) in Scenario CS provides much better capture of the observed hydrograph "dynamics" (i.e., reaction to rainfall events), especially for high streamflow.In scenarios NCS and RS the median hydrograph is less dynamic, with deeper recession and limited ability to reach the observed peaks.This qualitative assessment is confirmed by the value of the NSE (panel c), which is highest in scenario CS (0.64), followed by scenarios RS and NCS, both with a value of 0.59.
In terms of FDC performance, all scenarios tend to suffer from underestimated high streamflow.Apart from that, the comparison of FDCs shows results consistent with the comparison of hydrographs.In particular, scenario CS produces the most precise predictive distributions and has the median closest to the observed values.Scenarios NCS and RS produce wider predictive distributions with comparable precision to each other.Note that, due to the logarithmic scale of the vertical axis, the highest 1% of the streamflow values occupies almost half of the plot area-hence a large visual mismatch may result in a limited difference in the metrics.
These qualitative findings are confirmed by the quantitative metrics shown in panels (c and d): • The reliability of the FDC predictions is generally worse than for the hydrographs, with best values obtained for scenario CS (0.35) followed by RS (0.42) and NCS (0.44).• The FDC precision metric has similar values for scenarios NCS and RS (0.17 and 0.18) while it improves for scenario CS (0.09).• The FDC NSE slightly declines when going from scenario CS (0.92) to NCS (0.92) and RS (0.91).
• The volumetric bias is substantially higher in scenario NCS and confirms the general tendency of the simulations, seen in the FDCs, to underestimate streamflow.

Performance Over All Catchments
Figure 6 shows the performance metrics calculated for all the catchments in all periods.The figure is subdivided into seven panels, with the left column showing the metrics calculated on the hydrograph and the right column showing the FDC metrics.Each point in the plots represents the value of the metric for the simulations of a specific catchment and period.Scenarios are evaluated based on the mean position and spread of the corresponding group of points (metric values).A position toward the right-hand side of the plot indicates better mean performance (note that for precision, reliability, and volumetric bias, the horizontal axes are flipped so that "better" results are on the right, for consistency with the other metrics).The spread of metric values reflects the consistency of simulations for the given scenario.In all scenarios, the points that refer to the same catchment (same color) tend to be clustered together.
Panel (b) shows the precision metric of the hydrograph.In this case, there is a clear trend among the scenarios, both in terms of mean position and spread, with scenario CS achieving the "best" performance (mean precision of 0.46 with standard deviation of 0.047), NCS coming second (mean of 0.62, standard deviation 0.058) and RS third (mean 0.63, standard deviation 0.084).
Panel (c) and (d) show the reliability metric of the hydrograph and the volumetric bias.For both metrics, all three scenarios perform similar in terms of mean value and spread.It is noted that the simulations in the catchment Herisau in period P3 in scenario NCS have considerably worse performance metrics than the other catchments in the same scenario.
Panel (e) shows NSE of the median FDC.There is a clear trend in the scenarios: CS achieves the highest mean (0.96) with low spread (standard deviation 0.023), NCS achieves a similar mean (0.95) but increases the spread (standard deviation 0.035), and RS deteriorates somewhat in the mean (0.89) and in the standard deviation (0.062).
Panel (f) shows the precision metric of the FDC.Scenarios NCS and RS achieve a similar performance in terms of mean (0.17 for both) and spread (standard deviation 0.022 and 0.025), with scenario RS that has two outliers that perform substantially better than the rest.Both scenarios NCS and RS are strongly outperformed by scenario CS (mean 0.10, standard deviation 0.01), with almost no simulation performing better in scenarios NCS and RS than in scenario CS.
Panel (g), finally, shows the reliability metric of the FDC.All the scenarios perform similarly, with the presence of one bad outlier in the scenario NCS (Herisau, period P3).

Posterior Distribution of the Model Parameters
Figure 7 shows the posterior distributions of selected parameters in the three scenarios.The parameters have been selected as representative of three common behaviors: 1. Parameters that are identifiable in all scenarios, with posterior distributions that are clearly distinct from the priors.An example is given by parameter D, which controls the split of outflow from the UR between the fast and the slow reservoirs.2. Parameters that are well identified in scenario CS but lose identifiability in scenarios NCS and RS.An example is given by parameter k WR , which controls the outflow of the snow reservoir.3. Parameters that are weakly identified by all the scenarios, with the posterior distributions that differ little to moderately from the priors.An example is given by parameter k SR , which controls the outflow of the SR.
In terms of the other model parameters, β UR and k FR follow the behavior of the first group, and   max UR follows the behavior of the second group.Panel (a) shows the simulated hydrographs.Reliability is better in scenario CS than in scenario CH (metric values of 0.049 vs. 0.18).Scenario CS also achieves the tightest (best precision) hydrograph, as confirmed by the precision metric shown in panel (d).However, a closer inspection of the hydrograph indicates that while scenario CH yields a larger uncertainty in winter, it has lower uncertainty in summer.This behavior is not specific to the selected catchment and period, but is generally common across all catchments, especially in years when precipitation is scarce during late spring.
The median hydrograph (dashed line) is more dynamic in scenario CS and captures better the magnitude of the peaks.Nevertheless, panel c shows a slightly better model performance in scenario CH (NSE 0.69 vs. 0.64 for scenario CS).
Panel (b) shows the simulated FDCs.The predictive distribution generated in scenario CH tends to be more precise, especially for low streamflow.On the other hand, the median of scenario CS is closer to the observed data than scenario CH, which tends to overestimate the observed values, particularly for low streamflow.Both simulations tend to underestimate high streamflow.These visual results are confirmed by the performance metrics (panels c and d): the FDC precision is better in scenario CH (0.06 compared to 0.09), the (hydrograph) volumetric

10.1029/2022WR031929
19 of 32 bias is worse in scenario CH (0.10 compared to 0.04), and the NSE of the FDCs are similar.Finally, FDC reliability is higher in scenario CS than in scenario CH (0.35 vs. 0.58).

Performance Over All Catchments
Figure 9 reports the performance metrics for scenarios CS and CH for all catchments and periods considered in Experiment 2. From these plots, it is difficult to determine which scenario produces the best "overall" results.In particular: • Volumetric bias, FDC Nash-Sutcliffe and reliability (panels d, e, and g): scenarios achieve a comparable performance, both in terms of spread and mean value.• Hydrograph NSE and FDC precision (panels a and f): the simulations in scenario CH outperform those in scenario CS, with a better mean value and a lower spread for both metrics.• Hydrograph precision and reliability (panels b and c): the simulations in scenario CS outperform those in scenario CH.In particular, in terms of hydrograph precision, scenario CS shows better mean and spread.In terms of hydrograph reliability, scenario CS has a better mean but the presence of some outliers increases broadly the spread.Panel (a) shows the simulated hydrographs.Although scenario PS has narrower uncertainty bounds, we emphasize, as noted in Section 2.1, that in this scenario the standard deviation of the residual error model (σ Ω ) is set to zero and, therefore, streamflow uncertainty is represented solely through parametric uncertainty.
The median hydrograph (dashed line) is more dynamic and closer to the observed data in the scenario RS.This behavior is confirmed by the large difference in NSE (panel c) between the two simulations (0.59 for scenario RS, 0.38 for scenario PS).
The biggest difference between the two scenarios can be observed in panel (b), which shows the simulated FDCs.
The predictive distribution of the scenario RS, as already analyzed in Section 5.1, is consistent with the observations, in terms of the fit of the median as well as the tightness and distribution of the uncertainty bounds.In contrast, in scenario PS, the median of the predictive distribution is far from the observed values and the uncertainty bounds are extremely wide.

Performance Over All Catchments
Figure 11 presents the performance metrics for scenarios PS and RS.Excluding hydrograph precision (panel b) and FDC reliability (panel g), the scenario RS achieves performance metrics that are always better, or at least similar, to those in scenario PS.This finding is particularly evident in panels (a and e), which show that scenario DAL MOLIN ET AL.
10.1029/2022WR031929 21 of 32 RS has higher Nash-Sutcliffe values for both hydrograph and FDC, and in panel (f), which shows that scenario RS achieves a more precise FDC.Hydrograph reliability and volumetric bias (panels c and d), on the other hand, show similar performance for scenarios PS and RS.In particular, hydrograph reliability has the same mean in these two cases, and less spread in scenario PS than in scenario RS; volumetric bias has a lower mean and larger spread for scenario PS than for scenario RS.Finally, panels (b and g) show that scenario PS outperforms scenario RS in terms of a better mean and narrower spread of metric values.

Experiment 1: Signature Calibration in Different Scenarios
This experiment compares the results of the three signature-calibration scenarios: concomitant (CS), non-concomitant (NCS), and regionalized (RS).We first discuss aspects associated with the quality of model predictions, then the impact of errors in the signatures, and, finally, parameter identifiability.

Quality of Model Predictions
Model predictions are expected to become increasingly uncertain when moving from gauged to progressively more challenging ungauged conditions.Hence, scenario CS is expected to have the smallest uncertainty, because it uses signatures derived from streamflow observed directly at the catchment and period of interest.Scenario DAL MOLIN ET AL.

10.1029/2022WR031929
22 of 32 NCS is expected to yield higher uncertainties than scenario CS, because transferring the signatures in time requires the use of a signature transfer model, which is subject to uncertainty.Scenario RS is expected to yield the highest uncertainty, because it uses signatures estimated from streamflow measurements in donor catchments using a stochastic FDC model, and transfer of signatures in space is arguably more challenging than transfer in time.
This pattern of increasing uncertainty emerges clearly in Figure 6, and is consistent both for hydrographs (panels a and b) and FDCs (panels e and f).In terms of NSE and precision, both for hydrographs and FDCs, performance is highest for scenario CS, followed by scenario NCS, and then scenario RS.On average, the progression from concomitant to RS increases uncertainties in reproducing time series and FDCs and worsens the model fit to the observations.In particular, taking Scenario CS as the (baseline) reference, Scenario NS incurs 36% more uncertainty (i.e., worse precision), and a 5.6% decrease in streamflow NSE, while Scenario RS incurs 38% more uncertainty and a 17% decrease in streamflow NSE.These values are calculated by taking the average of the hydrograph performance metrics shown in Figure 6.
However, from our perspective in this study, a key objective is to provide a reliable description of uncertainty, essentially regardless of its magnitude.As our focus is on streamflow prediction, it is important that uncertainties in reproducing streamflow time series are not underestimated or overestimated, which would lead to overconfidence or underconfidence respectively.
The results in Figure 6 are reassuring, showing that the Bayesian framework is able to achieve comparable reliability and bias in all scenarios, both in terms of hydrographs (panel c) and FDCs (panel g).Reliability performance in an ungauged catchment is sensitive to the correct quantification of uncertainty in the signature transfer model.Such correct quantification is challenging, as for example, extreme events can make the transfer of signatures in time unreliable.This is the case of the catchment Herisau, which has poor reliability in scenario NCS (panels c and g) and will be discussed more in detail in the following section.
Comparison of our results with previous work is limited by differences in methodologies as well by the relatively small number of studies that analyzed PUB under different data availability scenarios.The study of Montanari and Toth (2007) is directly related to our work as it compared the results of time-domain calibration, concomitant (scenario CS, in this study) and non-concomitant (scenario NCS) signature calibration, and prior simulations.
Although our inference procedure is different and the case study is based on different signatures, the findings are broadly consistent.In particular, we show an average decrease of streamflow NSE of approximately 5.6% when moving from concomitant to NCS, and that the magnitude of this decrease is highly dependent on the conditions of the period considered.
Biondi and De Luca (2016) compared calibration to observed signatures (our scenario CS) with calibration to RS (our scenario RS), with the objective of estimating design floods for an assigned return period.Their results showed that, for their objective, the use of RS does not deteriorate model performance.Our study, instead, shows a better precision in scenario CS than in scenario RS, but also comparable performance in other metrics (e.g., FDC reliability and volumetric bias).

Impact of Errors in the Signatures
When signatures are derived using data from a different time period or catchment, their numerical value can be very different from the value computed from observed hydrographs in the catchment and time period of interest.This difference is due to the uncertainty associated with the signature model.Time transfer of signatures incurs uncertainty due to time variability of hydrological behavior, whereas signature regionalization uncertainty is affected by unaccounted spatial variability of the signatures.
This study shows the effect of errors associated with the transfer of signatures in time or space: • Transfer in time.In scenario NCS some of the catchments had model performance in one period that was far worse than in the other periods.For example, this is the case for the catchment Herisau in period P3 (blue crosses in Figure 6).Poor model performance in this catchment may be due to the large flood in 2002: such extreme streamflow has a large impact on the FDCs, especially considering that they usually last more than a single day.Therefore, when calibrated using NCS, the model is forced with meteorological inputs that generated the flood but calibrated to signatures that do not contain information about this flood.This mismatch could lead to unrealistic parameter values and explain the mediocre fit.
• Transfer in space.The FDCs model has some apparent deficiencies in capturing the observed FDCs, already discussed in Section 4.4.3.This deficiency has led to our decision to augment the FDC model with a bias correction term.Figure 12 illustrates the effect of not implementing bias correction, and compares the results of scenario RS with and without bias correction (   (T) ε = 1 in Equation 20).Overall, results are worse if bias correction is omitted, manifesting in increased uncertainty in the hydrograph and FDC, reducing the fit of their medians to observed data, and suffering worse performance in almost all the metrics.
The two examples above highlight that a small difference in estimated signatures can lead to a large difference in streamflow predictions.This sensitivity has also been noted in previous studies.For example, Westerberg et al. (2011) discussed the concept of "disinformative" signatures, Castiglioni et al. (2010) showed the impact of errors in the RS due to deficiencies in the regionalization model, and Fenicia et al. (2018) demonstrated the effect of time shifts in the time series that are not captured by the signatures.All these studies reported that the presence of small errors in the signatures may lead to large differences in the model predictions, as shown in this study.

Parameter Identifiability
Parameter identifiability generally depends on the calibration scenario.In particular, as the signatures on which the model is calibrated become more uncertain, model parameters become less identifiable.However, such behavior is not uniform for all model parameters, but strongly depends on the specific parameter, on the processes that it is intended to represent, and on whether such processes are captured by the signatures used for model calibration.This behavior is shown in Figure 7, which illustrates how some parameters lose identifiability when calibrating to estimated signatures.The following interpretation can be provided for specific cases: • Parameters that remain identifiable in all scenarios.These are parameters that have a strong influence on the model output.For example, parameter D which controls the splitting of the flow between the "fast" and the DAL MOLIN ET AL.
10.1029/2022WR031929 24 of 32 "slow" part of the model.Changes in this parameter would strongly affect the FDCs in all scenarios, since the slope of the FDC can be related to the proportion of quick and slow response in the catchment.• Parameters that lose their identifiability when moving to an ungauged scenario.These are parameters that exert an influence on the signatures, but this influence may be blurred by higher uncertainty in the target signatures.For example, parameter k WR , which controls the outflow of the snow reservoir, exhibits this behavior.The transfer error may contribute in obscuring this behavior, but the omission of snow dynamics in the regionalization model can also be responsible; this is the case of the FDCs model, which does not distinguish explicitly the separation between rainfall and snowfall.• Parameters that are weakly identifiable, regardless of the scenario.These are parameters that have an effect on the model output that is not captured by the signatures.For example, consider parameter k SR , which controls the release rate of the "slow" part of the model.This parameter has a strong effect on the hydrograph, by affecting the baseflow component, but its effect on the FDCs may be less evident.This change in behavior compromises its identifiability when the model is calibrated to FDCs.A different choice of signatures (e.g., including the baseflow index) would have probably helped in this specific case, but this analysis is not in the scope of this paper.
Similar results regarding parameters identifiability have been highlighted by Biondi and De Luca (2016), who showed that using observed signatures makes the posterior distributions sharper than when using modeled signatures.

Experiment 2: Comparison With Time-Domain Calibration
The aim of Experiment 2 is to assess the overall performance of signature calibration, in its best scenario (CS), using the classical time-domain calibration as a benchmark.
The results in Figure 9 (Section 5.2.2) show that scenarios CS and CH present a tradeoff in terms of fit to the observed data, uncertainty, and performance metrics.For example, scenario CS outperforms scenario CH in terms of hydrograph precision and reliability, but rank lower in terms of hydrograph NSE.This finding indicates that calibration to seasonal FDCs is not equivalent to calibration to streamflow time series, and that the choice of signatures needs a careful assessment.Note that a comprehensive comparison between signature and time domain calibration is beyond the scope of this study (e.g., see Fenicia et al., 2018 for an earlier dedicated investigation).
Our findings are in line with previous studies (e.g., Castiglioni et al., 2010;Kavetski et al., 2018), which have shown that signature calibration can achieve similar or slightly inferior performance to time-domain calibration and that, in some cases, the FDCs are a viable surrogate for the hydrograph in the context of model calibration.
On the other hand, Kim et al. (2017) pointed out how the lack of time information in the FDCs can reduce their utility, and suggested the combination of FDCs with other signatures, such as the flashiness index, that include timing information (see also Fenicia et al. (2018)).
The study of loss of information due to the usage of different signatures is, however, a broad and important topic that goes beyond the scope of this paper and requires much further work.

Experiment 3: Goodness of the Regionalization Study
The last experiment compares the performance of simulations in the ungauged scenario RS with simulations generated using the prior distribution of model parameters.The objective is to assess the performance of the regionalization approach, comparing it with an "easy" benchmark, in the sense that the effort and the data needed to run prior simulations are limited and can be done in any ungauged catchment as long as meteorological forcing is available.
The results in Figure 10 (Section 5.3.1)show that the simulations in scenario RS generally outperform the simulations in scenario PS, though not without some subtleties.In terms of hydrographs, the two scenarios have similar performance, with scenario RS achieving a better fit to the observed data and scenario PS achieving better precision.However, in terms of FDCs, model performance in scenario PS drops substantially: the simulated FDC looks clearly unrealistic, with huge uncertainty and large deviations from observed data.These results are confirmed by the analysis overall catchments, shown in Figure 11 (Section 5.3.2).
DAL MOLIN ET AL. 10.1029/2022WR031929 25 of 32 As such, Figure 10 shows that similar uncertainty bounds in the hydrograph space can correspond to very different uncertainty bounds in the signatures space.In particular, the difference between scenarios PS and RS is not apparent when considering the total uncertainty bounds in the hydrograph space, where both scenarios have similar precision.However, it becomes obvious in the signature space, where scenario PS have much wider uncertainty bounds than RS in terms of FDCs, which is exemplified by a much lower precision.The cause of this difference in behavior is as follows.In scenario PS, predictive uncertainty is determined purely by the uncertainty in the precipitation-streamflow model parameters.PS produces hydrographs with a wide range of dynamics (i.e., ranging from relatively constant to very responsive), which translate into FDCs with very different shapes (i.e., ranging from flat to steep) and hence a wide range of variability.In scenario RS, instead, predictive uncertainty is largely determined by the residual error.This scenario produces hydrographs with much more uniform dynamics, which map to similar FDCs.

Limitations and Future Work
This study has a number of limitations that warrant follow-up research.
First, the case study catchments in this work all belong to the same geographic region (the Thur basin).For this reason, they can be thought to be similar, or at least less varied than catchments in different climatic regions.Nevertheless, the Thur basin does exhibit appreciable variability in physical and local hydroclimate characteristics, which in turn manifests as appreciable differences in mean streamflow, in seasonality, and in baseflow (see Dal Molin, Schirmer, et al. (2020) for further details).Future studies could consider large-sample hydrology datasets (e.g., Addor et al., 2017) with wider catchment variability, including arid conditions.This work would likely require revisiting key modeling choices, including the choice of precipitation-streamflow models, signature transfer models and model performance metrics (e.g., focusing on low flows).Moreover, given the large computational cost of ABC inference (millions of model runs), efficient numerical and software implementations, including parallel computation, would become paramount.
Second, the case study catchments are located in a well-studied area, where previous work could be exploited for the selection of the streamflow model (Dal Molin, Schirmer, et al., 2020) and the FDC model (Doulatyari et al., 2017).The choice of donor catchments also gives an opportunity, in scenario RS, to assume a similar behavior of the regionalization model in the catchments and, therefore, to estimate the parameters of the signatures error model using other catchments of the network.The application of the proposed methodology in less instrumented donor catchments may be more challenging and deserves future attention.
Third, our study focuses primarily on reliable estimation of streamflow predictive uncertainty rather than on its reduction.Further research is needed to understand the choice of signatures, signature estimation models, and precipitation-streamflow models that minimizes streamflow uncertainty.In this endeavor, the use of alternative measurements, such as independent remote sensing observations (e.g., Nijzink et al., 2018;Winsemius et al., 2008) or other types of constraints (e.g., Gharari et al., 2014) are of major interest.
Fourth, the deterministic FDC model does not account for snow dynamics.When precipitation is used in the signature model to calculate its parameters (Section 4.4.3),no distinction is made between rainfall and snowfall.
In this work, this limitation does not appear problematic, given the limited amount of snow in the case study area (at most 20% of annual precipitation).Moreover, the stochastic model includes a bias correction that may partly compensate for such deficiency.Importantly, note that the models calibrated in snow affected catchments (e.g., Appenzell) do not perform systematically worse than the ones of catchments with little snow (e.g., Wängi).Nevertheless, future work should incorporate snow dynamics more explicitly into signature regionalization models.
Finally, this work treats the precipitation-streamflow model and the signature model as conceptually independent from each other.However, it may be of interest to consider the conceptual connections between these types of models.For example, both the signature model of Doulatyari et al. (2017) and the HyMod model represent the catchment as conceptual reservoirs.Future work could explore these similarities and their impact on signature estimation.

Conclusions
This study presented a new approach for calibrating a precipitation-streamflow model to regionalized streamflow signatures.The approach relies on ABC, which avoids direct calculation of the likelihood function (instead, it employs sampling from the probability model, which in this case is easier to implement).We presume the availability of a signature model, which predicts the signatures at the target catchment.The main intended application of the new method is for streamflow time series prediction in ungauged catchments, where current methods are limited by not explicitly accounting for the uncertainty in the RS.We show how to estimate uncertainty in the RS, and how to incorporate this uncertainty into the calibration of a precipitation-streamflow model.
The proposed framework is evaluated in a case study based on six catchments of the Thur basin, Switzerland.This catchment has high quality data, a long history of hydrological studies, and high operational importance in the water supply of the region.In order to test the quality of predictions in ungauged conditions, we followed a progression from gauged to ungauged, with three scenarios: "concomitant," where signatures are observed, "non-concomitant," where signatures are extrapolated in time, and "regionalized," where signatures are extrapolated in space.For reference, we also calibrated the model in the time domain and run simulations using parameters sampled from the prior distribution.
The results of the experiments suggest that: 1.In line with expectations, the progression from concomitant to RS increases uncertainties in reproducing streamflow time series (38% more uncertainty) and reduces the fit to the observations (17% decrease in NSE).That said, the uncertainty was reliably estimated in all scenarios, which is an important finding that provides confidence in the proposed approach; 2. Poor quality of model predictions could be attributed to cases where RS are corrupted by large errors, such as in the Herisau catchment where the model is forced with meteorological inputs that generated a flood but calibrated to signatures that do not contain information about this flood.In such cases, the error in estimating the signatures results into streamflow predictions with poor performance metrics (e.g., doubling the volumetric bias); 3. The use of RS may reduce the identifiability of some parameters of the streamflow model (e.g., the release rate of the snow reservoir).This behavior is likely explained by the representation of some processes (e.g., snow) being lost when calibrating to regionalized rather than observed signatures; 4. Signatures based on seasonal FDCs appear to be an adequate choice for the calibration of precipitation-streamflow models, as their use yields streamflow predictions comparable in quality to those obtained in time-domain calibration.However this finding may be specific to the case study and requires further corroboration, including in different hydroclimatic conditions; 5. Calibration to RS generates streamflow predictions of clearly better quality than prior simulations, that is, streamflow predictions obtained using prior parameter distributions.This finding shows that RS contain useful information to constrain model parameters and hence predictions.
The proposed methodology represents a step toward improved predictions in data scarce regions.Future work is needed to generalize the approach to more diverse areas, especially where donor catchments are not available for the calibration of the signature regionalization model, to better understand the impact of signature choice on the results, and to undertake comprehensive comparisons of the proposed methodology to other approaches for PUB.
DAL MOLIN ET AL.

D2. Precision
The precision metric, P, quantifies the spread of a predictive distribution, where sdev v i denotes the standard deviation of the predictive distribution of the ith element of v.
Note that the precision metric does not depend on the fit between modeled and observed data.The "best" value of this metric is 0, which occurs for a prediction with no uncertainty, that is, where all predictive replicates form a single curve.Higher values denote worse precision.

D3. Reliability
The reliability metric, R, quantifies the consistency of the observations and the predictive distribution.The metric is defined as the area between the predictive quantile-quantile curve (PQQ plots) and the 1:1 line.The PQQ plot is a uniform quantile-quantile plot of the quantities  Pr( ≤ ṽ) for i = 1,…,N, where V denotes the random variable underlying the predictive distribution from which the observed value    is assumed to be sampled from.When applied to streamflow time series analysis, V i is the streamflow at the ith time step, whereas when applied to FDC analysis it is the FDC at the ith quantile.The mathematical derivation of the reliability metric can be found in McInerney et al. (2017).The best value of this metric is 0. Higher values denote worse reliability.

D4. Volumetric Bias
The volumetric bias metric measures the long-term water balance error of the predicted data, The mean in the numerator of Equation D3 follows the convention from earlier studies (e.g., Ehlers et al., 2019;McInerney et al., 2017;Oliveira et al., 2018) and is less susceptible to noise/outliers due to the integration (summation) over a long time series.
The best value of the volumetric bias metric is 0 and higher values denote an increasing water balance discrepancy.

Figure 1 .
Figure1.Schematic of the inference setup used in this work to estimate the streamflow model parameters θ(H) .Panel (a) illustrates the inference in a scenario where signatures can be computed directly from observed streamflow.Panel (b) illustrates the inference in scenarios where signatures are estimated using a regionalization model, which itself requires calibration (red dashed box).

Figure 2 .
Figure 2. Schematic of the algorithm used to infer the parameters of the precipitation-streamflow model.Panel (a) shows the algorithm used in scenario CS; panel (b) shows the algorithm used in scenarios non-concomitant signatures (NCS) and regionalized signatures (RS).Gray background is used to indicate the step of calculating the distance metric, which is the only difference between the initial sampling and evolution in scenarios concomitant signatures versus NCS and RS.

Figure 3 .
Figure 3. Map of the Thur basin indicating its catchments.Streamflow gauging stations are indicated with yellow dots.

Figure 4 .
Figure 4. Schematic representation of the lumped streamflow model used in the case study."P" represents the precipitation entering in the reservoirs, "E" the evaporation, and "Q" the outflow from the reservoirs.The subscripts indicate the reservoirs: WR, snow reservoir; UR, unsaturated reservoir; FR, fast reservoir; SR, slow reservoir.The governing equations are reported in Appendix B.

Figure 5
Figure 5 shows the results of scenarios CS, NCS, and RS for the representative Andelfingen catchment in period P3.Panel (a) shows the simulated hydrographs and panel (b) shows the simulated FDCs.Panels (c and d) show the values of the performance metrics.

Figure 5 .
Figure 5. Results of Experiment 1 for a representative catchment (Andelfingen) and time period (P3).Panel (a) shows the hydrograph, panel (b) the annual flow duration curve, panels (c and d) the metrics used for evaluating the simulations.In panels (a and b), the dashed line represents the median simulation; the solid lines represent 95% uncertainty bounds; the black dots represent the observed data.

Figure 6 .
Figure 6.Performance metrics achieved by the simulations in Experiment 1.The left column reports the metrics calculated on the hydrograph; the right column reports the metrics calculated on the annual flow duration curve.Catchments are distinguished by color and periods are distinguished by symbol.Note that some horizontal axes are reversed so that "better" metric values appear consistently on the right in all plots.

Figure 7 .
Figure 7. Posterior distributions of three selected parameters, demonstrating qualitatively different behaviors in the scenarios.The histogram bars represent the posterior distributions.The black dashed lines represent the prior distributions.

Figure 8 .
Figure 8. Results of Experiment 2 for a representative catchment (Andelfingen) and time period (P3).Panel (a) shows the hydrograph, panel (b) the annual flow duration curve, panels (c and d) the metrics used for evaluating the simulations.In panels (a and b), the dashed line represents the median simulation; the solid lines represent 95% uncertainty bounds; the black dots represent the observed data.

Figure 9 .
Figure 9. Performance metrics achieved by the simulations in Experiment 2. The left column reports the metrics calculated on the hydrograph; the right column reports the metrics calculated on the annual flow duration curve.Catchments are distinguished by color and periods are distinguished by symbol.Note that some horizontal axes are reversed so that "better" metric values appear consistently on the right in all plots.

Figure 10 .
Figure 10.Results of Experiment 3 for a representative catchment (Andelfingen) and time period (P3).Panel (a) shows the hydrograph, panel (b) the annual flow duration curve, panels (c and d) the metrics used for evaluating the simulations.In panels (a and b), the dashed line represents the median simulation; the solid lines represent 95% uncertainty bounds; the black dots represent the observed data.

Figure 11 .
Figure 11.Performance metrics achieved by the simulations in Experiment 3. The left column reports the metrics calculated on the hydrograph; the right column reports the metrics calculated on the annual flow duration curve.Catchments are distinguished by color and periods are distinguished by symbol.Note that some horizontal axes are reversed so that "better" metric values appear consistently on the right in all plots.

Figure 12 .
Figure 12. Results of the scenario regionalized signatures with and without bias correction for a representative catchment (Andelfingen) and time period (P3).Panel (a) shows the hydrograph, panel (b) the annual flow duration curve, panels (c) and (d) the metrics used for evaluating the simulations.In panels (a) and (b), the dashed line represents the median simulation; the solid lines represent 95% uncertainty bounds; the black dots represent the observed data.
is the mean of the predictive distribution of the ith element of v.