Understanding the time-varying importance of different uncertainty sources in hydrological modelling using global sensitivity analysis

: Simulations from hydrological models are affected by potentially large uncertainties stemming from various sources, including model parameters and observational uncertainty in the input/output data. Understanding the relative importance of such sources of uncertainty is essential to support model calibration, validation and diagnostic evaluation and to prioritize efforts for uncertainty reduction. It can also support the identi ﬁ cation of ‘ disinformative data ’ whose values are the consequence of measurement errors or inadequate observations. Sensitivity analysis (SA) provides the theoretical framework and the numerical tools to quantify the relative contribution of different sources of uncertainty to the variability of the model outputs. In traditional applications of global SA (GSA), model outputs are aggregations of the full set of a simulated variable. For example, many GSA applications use a performance metric (e.g. the root mean squared error) as model output that aggregates the distances of a simulated time series to available observations. This aggregation of propagated uncertainties prior to GSA may lead to a signi ﬁ cant loss of information and may cover up local behaviour that could be of great interest. Time-varying sensitivity analysis (TVSA), where the aggregation and SA are repeated at different time steps, is a viable option to reduce this loss of information. In this work, we use TVSA to address two questions: (1) Can we distinguish between the relative importance of parameter uncertainty versus data uncertainty in time? (2) Do these in ﬂ uences change in catchments with different characteristics? To our knowledge, the results present one of the ﬁ rst quantitative investigations on the relative importance of parameter and data uncertainty across time. We ﬁ nd that the approach is capable of separating in ﬂ uential periods across data and parameter uncertainties, while also highlighting signi ﬁ cant differences between the catchments analysed. Copyright © 2016 The Authors. Hydrological Processes . Published by John Wiley & Sons Ltd.


PREFACE
One of Keith Beven's major contributions lies in the field of uncertainty analysis.Following pioneering works by Freeze (1980), he was one of the first to introduce Monte Carlo strategies for uncertainty assessment in hydrological models (Binley et al., 1989;Beven and Binley, 1992), and he has pushed the topic of uncertainty analysis for almost a quarter century by now.His efforts are a main reason why uncertainty analysis in hydrology is more advanced than in most other environmental or natural hazard fields (Rougier et al., 2013).We bring two of Keith's major contributions together in the research discussed here.First, we use a time-varying implementation of sensitivity analysis that can be traced back to Keith's work on generalized likelihood uncertainty estimation (GLUE) (Beven and Binley, 1992;Freer et al., 1996).GLUE brought an easy to implement and effective approach to analyse parameter uncertainty of hydrological models.It also brought us a discussion of the validity of statistical assumptions given the specific nature of hydrological models (including strong nonlinearity and potentially large model structural errors).This discussion also opened the path for an investigation into what metrics are hydrologically relevant, rather than just statistically convenient.These questions, e.g.regarding the appropriate likelihood function based on these statistical assumptions in the context of such models, have still not been answered (e.g.Stedinger et al., 2008).These discussions are unlikely to go away unless we understand how to build likelihood functions that realistically account for all sources of error.Second, we consider Keith's more recent efforts focused on the implications of 'disinformative data', i.e. data points that are erroneous and negatively influence the model calibration or evaluation process.A key question is how we identify data points whose values are the consequence of measurement errors or inadequate observations (such

INTRODUCTION
Sensitivity and uncertainty analysis have become common practise in hydrological modelling.One strand of methods is based on Monte Carlo sampling of the parameter space and on conditioning these parameter sets by using one or more objective functions.This approach originates in the regional sensitivity analysis (RSA) method introduced by Young et al. (1978) and Spear and Hornberger (1980) in which these authors separate parameter sets into behavioural (well performing) and non-behavioural groups.Parameters matter for the model output if their marginal distributions across these two groups are different.Spear and Hornberger (1980) tested their approach on an algae bloom problem in a lake for which the sampled parameter sets either produced algae bloom (bad) or not (good).This binary separation made the separation into behavioural and non-behavioural groups simple and unambiguous.Beven and Binley (1992) generalized RSA by showing how the parameter sets can be conditioned on any performance metric if appropriately transformed (so that it has some, but not all, of the characteristics of a likelihood function).A parameter set is deemed behavioural if the associated value of the performance metric is above a modeller prescribed threshold (or below, if the performance metric is to be minimized).They found that in most cases, a wide range of behavioural parameter sets can be found for hydrological models.Keith Beven and colleagues referred to this finding as the problem of equifinality.Since then, RSA based on conditioning of performance metrics has been widely applied to investigate parameter uncertainty and their relative influence in hydrological and environmental modelling (e.g.Freer et al., 1996;Wagener et al., 2001;Sieber and Uhlenbrook, 2005).
One issue with using conditioning on performance metrics in RSA is that performance metrics are based on aggregation of model residuals across the whole time series used for calibration.This temporal aggregation process unavoidably leads to a loss of information (e.g.Freer et al., 2003).Sometimes, few but very large residuals will dominate the value of the performance metric, especially if the residuals are squared before aggregation (as carried out for instance when using the root mean squared error or the Nash-Sutcliffe efficiency).Wagener et al. (2003) suggested to apply a variation of the parameter conditioning used in GLUE as a time-varying algorithm in their dynamic identifiability approach (DYNIA).They estimate the performance metric value as a running mean using different window sizes.This approach reduces the loss of information and allows for an assessment of which periods are most informative for parameter calibration or regarding which data points might be erroneous.They visualize the conditional marginal cumulative distribution function (CDF) for each parameter so that they can both separate periods where conditioning takes place or not (i.e.where data is informative or not) and so that they can see which part of the parameter space is better performing (e.g. to test whether different parameter values are required for different system response modes as an indicator of model structural problems such as missing model dynamics).The chosen window size allows for tailoring across influence scales of parameters (Massmann et al., 2014), i.e. parameters controlling the quick recession process require shorter window sizes than those controlling baseflow or water balance processes.There has been a flurry of studies by using different analysis methods for time-varying SA (e.g.Wagner and Harvey, 1997;Wagener et al., 2003;Cloke et al., 2008;Reusser and Zehe, 2011;Kelleher et al., 2013;Herman et al., 2013a, b;Guse et al., 2014).
While the assessment and consideration of parameter uncertainty have become common practise, there is also the increasing recognition that data uncertainty can have a significant influence on model calibration and validation.Keith Beven and colleagues referred to this problem as that of disinformative data (Beven et al., 2008;Beven and Westerberg, 2011).Specific data points, when erroneous, can have a disproportionate impact on model calibration or evaluation, and approaches to identify them are rather poorly developed in hydrology.Initial work for example demonstrates that event-based water balance estimates can show much larger runoff volumes than feasible given rainfall magnitudes in the streamflow record (Beven and Smith, 2014).Such unrealistic data points can lead to large residuals, which, in turn, strongly influence performance values.It is therefore important to be able to identify these data points and appropriately decide on their validity.
In this paper, we address two questions: (1) Can we distinguish between the relative importance of parameter uncertainty versus data uncertainty in time?(2) How do relative influences change in places with different catchment characteristics?To answer these questions, we introduce a time-varying implementation of a recently proposed density-based SA approach called PAWN (Pianosi and Wagener, 2015).As a hydrological model, we use the widely applied lumped Hydrologiska Byråns Vattenbalansavdelning (HBV) model (Bergstrom, 1995).We repeat our analysis for three catchments in the USA that span different hydroclimatic regimes and geographic locations.We assess the relative importance of data and parameter error/uncertainty in time and investigate how far SA allows for a formalized approach to identify periods where data uncertainty could have a disproportionally large influence.Because we consider parameters and data as sources of uncertainty, but not model structure, our approach to identify potential disinformation is conditional on the hydrological model being adopted (the HBV model in our case).As such, the approach is complimentary to other methods (e.g.Beven and Smith, 2014) where a more general model is used, which however only captures part of the runoff generation process (for example, the event runoff coefficient).
Our results present one of the first quantitative investigations of the relative importance of parameter and data uncertainty in time.Understanding this variability is relevant for investigations into additional data collection needs and model calibration/evaluation.

Hydrological model and study sites
The hydrological model investigated in this study is the lumped HBV conceptual model.It includes three components: a snow accumulation/melting module, a soil moisture accounting module and a flow routing module.The forcing input data are time series of temperature, precipitation and potential evapotranspiration.The model is described in various articles, e.g.Bergstrom (1995), Seibert (1997) and Kollat et al. (2012).A schematic is given in Figure 1.
At each time step, the model classifies precipitation as either rainfall or snowfall depending on whether temperature is above or below a given threshold (TS).Snowfall and rainfall contribute to the water balance of the solid and liquid component of the snowpack respectively.Exchanges between the two components are allowed through either snowmelt or refreezing, depending on the temperature being above or below the threshold TS.The amount of snowmelt or refreezing is linearly proportional to temperature via two proportionality coefficients CFMAX and CFR (see again schematic in Figure 1).When the liquid component exceeds the snow pack holding capacity (CWH), the excess water leaves the snowpack and inputs the soil moisture accounting module.The implementation of the soil moisture accounting and flow routing modules are the same as in Kollat et al. (2012), which includes three parameters (β, LP and FC) for the soil moisture accounting and six parameters (PERC, K0, K1, K2, UZL, MAXBAS) for the flow routing.The meaning, units of measurements and range of variation of these parameters are summarized in Table I.
The model is applied to simulate streamflow in three catchments in the USA with very different climatic characteristics: the English River in Iowa, a relatively humid, snow-affected catchment; the French Broad River in North Carolina, a very wet catchment; and the Guadalupe River in Texas, a very dry catchment.Time series of daily streamflows and meteorological inputs (precipitation, temperature and potential evapotranspiration) for these catchments were developed as part of the Model Parameter Estimation Experiment (Duan et al., 2006).The characteristics of the three catchments are summarized in Table II.

Characterization of the uncertainty sources
The goal of this study is to assess the relative importance of parameter and data uncertainty for model accuracy.We group sources of uncertainty into six groups: (1) the observational uncertainty in the precipitation time series, (2) the uncertainty in the potential evapotranspiration time series, (3) the uncertainty in the four parameters of the snow accumulation/melting model component, (4) the uncertainty in the three parameters of the soil moisture accounting component, (5) the uncertainty in the six parameters of the flow routing component, and (6) the uncertainty in streamflow observations used to evaluate model performance.These sources of uncertainty are characterized as follows.
Parameter uncertainty is described by assuming independent uniform distributions with the ranges reported in Table I.These ranges were defined by combining a priori knowledge about the physical meaning of each parameter and a preliminary evaluation of the model's behaviour in each study area.Kollat et al. (2012) provide a set of wide parameter ranges that should cover catchments with any hydroclimatic characteristics across the USA.Sampling from those ranges, we ran Monte Carlo simulations and identified behavioural parameterizations by defining a set of thresholds on model performances (we considered root mean squared error, absolute mean error and bias).The range of variation of the behavioural parameterizations was then taken as uncertainty range for each study site.These ranges are reported in Table I.As it can be noted from the table, some of these ranges are still quite large, and for some parameters (for example, LP and K2), they are the same as in Kollat et al. (2012), which means that the conditioning on performances did not constraint them.This is a consequence of our choice of quite loose performance thresholds, which, in turn, reflects the fact that our aim is to define ranges that reasonably reflect our uncertainty in model parameters, rather than to identify a small set of highly performing parameterizations.
As for data uncertainty, a range of studies has assessed how much uncertainty can be expected in certain measurements of hydrological and meteorological vari-   (2012).Here, we assume what we believe are typical ranges for the circumstances present in our case studies.Precipitation uncertainty was described using stormdependent rainfall depth multipliers as proposed by Kavetski et al. (2002Kavetski et al. ( , 2006)), which corresponds to the assumption that precipitation errors are multiplicative and that the magnitude of the multiplicative error varies from storm to storm.The results discussed in the succeeding texts are obtained using storm-dependent multipliers drawn from a uniform distribution over the interval [0.6, 1.4], which corresponds to assuming a maximum error in precipitation data of ±40%.
Multiplicative errors are also used for potential evapotranspiration; however, here a constant multiplier is used for the entire time series.We assumed a uniform distribution over [0.8, 1.2] for this multiplier, thus allowing for a maximum error of ±20%.
Finally, for flow data, we used an additive error model where errors are described by an autocorrelated heteroschedastic Gaussian process, with zero mean and variance linearly proportional to the flow (Schoups and Vrugt, 2010).The two parameters of this model are set to ensure that 99% of the errors on flow fall within the interval ±0:2q obs t , i.e. a maximum error in flow observations of ±20%.More details about this model and the procedure to set its parameters are given in the Appendix.

Definition of the performance metric
In our study, the performance metric used to synthetically measure the model accuracy is the root mean squared error (RMSE).Because our goal is to investigate how relative influences vary in time, we compute one value of the RMSE for each time step along the simulation period, by using a moving window centred around that time step, i.e.
where q sim k is the simulated flow on day k, q obs k is the observed flow, t is the time step under analysis, and w is the semi-length of the moving window.

Global sensitivity analysis method: PAWN
In global SA (GSA), each source of uncertainty (or input factor in the GSA terminology) is associated with a sensitivity index that measures the relative influence of that factor on the model performance.Here, sensitivity indices will be computed according to a density-based method called PAWN (Pianosi and Wagener, 2015).For each factor, say the ith, the PAWN sensitivity index is defined as where F y is the unconditional distribution of the performance metric y, i.e. the one induced by variations of all the factors, and F y x i j is the conditional distribution of y, i.e. the one induced by variations of all factors but the ith, which is fixed to a nominal value x i .
The rationale of Equation 2 is the following.If the unconditional and conditional distributions are very similar, it means that variations in the ith factor do not significantly affect the variability of y, and therefore, that factor has little influence.Conversely, the larger the difference between the two distributions, the more influential the input factor.This is captured by the inner maximum in Equation 2, which provides a measure of the distance between the two CDFs.The outer maximum in Equation 2 instead is used to remove the variability in the results that might arise from different choices of the nominal value x i .By taking the maximum with respect to x i , we ensure that the sensitivity index of Equation 2 is zero only if the ith factor has no influence at any point in its space of variability.
In the operational implementation of the method, the outer maximum in Equation 2, i.e. the one with respect to the conditioning value of x i , is approximated by the sample mean over a prescribed number of conditioning values (e.g.10).For each of these, the inner maximum, i.e. the maximum absolute difference between CDFs, is approximated by using empirical distribution functions.These are obtained by evaluating the model against input samples where all input factors vary (unconditional distribution) and against samples where the ith input is fixed to the conditioning value and the others vary (conditional distribution).The PAWN method is implemented in the Sensitivity Analysis for Everybody (SAFE) Toolbox (Pianosi et al., 2015), which is freely available for academic use.
Density-based methods have a number of advantages.In the first place, they can be applied to any type of input factor, including time series of model forcing inputs or output observations, as in our study.This is not possible for other GSA methods.For instance, RSA compares probability distributions of the input factors in the behavioural and non-behavioural group.Thus, it presumes that each input factor x i is a scalar variable that can be sensibly appointed a CDF, which is not the case when an input factor is a group of parameters or a time series, as in our study.In PAWN instead, sensitivity indices are defined based on the values of y only, as shown in Equation 2. Therefore, they can be computed regardless of the mathematical properties and meaning of the input factors.Variance-based SA methods (Saltelli et al., 2007) also possess this property.However, variance-based methods are not suitable when the output distribution is highly skewed or multimodal and variance would be a poor measure of uncertainty (Borgonovo, 2007(Borgonovo, , 2014)).Density-based methods, instead, are applicable also in those situations because they assess changes in the entire distribution of the output y, rather than in one of these moments only.Furthermore, because the PAWN index is defined on CDFs that are efficiently approximated by empirical distribution functions, its application requires a relatively limited number of model evaluations (Pianosi and Wagener, 2015).
Finally, another advantage of PAWN is that sensitivity indices can be easily tailored to focus on a subregion of output values of particular interest, for instance, below a prescribed threshold y .This is achieved by simply adjusting Equation 3 as follows: In our context where the output y is a performance metric (to be minimized), the threshold y would represent a minimum level of performance, and by using Equation 4means that only model evaluations that achieve that minimum performance contribute to the sensitivity indices.Using the RSA/GLUE terminology, we might call these model evaluations behavioural.However, there is a subtle difference with respect to RSA.In RSA, both behavioural and non-behavioural samples contribute to determine sensitivity.Indeed, it is the very separation between the two groups that is used to measure sensitivity.In Equation 4, instead, the separation is only used to filter out non-behavioural samples.The rationale is that if we set the threshold to a reasonably loose value, Equation 4will ensure that any model evaluation with unreasonably large deviations from observations do not bias the SA results.

Experimental set-up
In our application, we use 3000 random samples to build the unconditional CDFs, and 1000 random samples to build each conditional CDF.For each of the six input factors, conditional CDFs are computed at 20 conditioning values.The total number of model evaluations is therefore equal to 3000 + 1000 × 20 × 6 = 123 000.For each model evaluation, the procedure to generate and propagate the six sources of uncertainty throughout the model is as follows (Figure 2).( 1) Generate a time series of perturbed precipitation by multiplying the original time series by a randomly sampled multiplier (one per each storm).( 2 In our application, we set w = 15 (days) so that the window size is 31 days.As a threshold value y, we use twice the mean of observed flows over the moving window, i.e.
In other words, at each time step, we discard those model evaluations where deviations from flow observations are on average higher than twice the mean flow over that window.As anticipated, this threshold value is quite loose, and it is only meant to avoid that the analysis be biased from few samples corresponding to very bad model performance.All computations were performed using the SAFE Toolbox (Pianosi et al., 2015).

RESULTS
Figure 3 shows the sensitivity indices of the six sources of uncertainty for the three catchment sites.Panels on the left show the time-varying sensitivity of the RMSE computed over a time window of 2w + 1 = 31 days, while panels on the right show the aggregate sensitivity of the RMSE computed over the entire simulation period.

English River
We first analyse the top panels in Figure 3, which refer to the English River.By comparing the two panels, we notice in the first place that aggregation indeed induces a loss of information.For example, by looking at aggregate results (top right), we would conclude that uncertainty in flow observations is the least important for this catchment; however, when looking at time-varying results (top left), we see that it can have a strong influence although very localized on some specific events.Our results thus confirm that GSA of aggregate performance metrics might not convey the same information as GSA of disaggregate (time varying) metrics.
In general, parameter uncertainty is more influential than data uncertainty in this catchment.Parameters of the soil moisture accounting module are the most influential among the three groups.As expected, the snow parameters are particularly influential in those times of the year where snowmelt occurs while they have no influence in summer time, which confirms that the model's behaviour is consistent with the system's behaviour.
Interestingly, uncertainty in precipitation data does not seem to have a strong influence in this catchment.This might be due to a limitation in the assumed error model for precipitation, which only allows for variations in precipitation intensity but not in the temporal distribution of precipitation days.Allowed variations in the parameters instead are such that they might amplify or reduce timing errors, which are likely more influential than amount errors.
As anticipated before, uncertainty in flow data has a relatively high influence in some specific events.To understand this better, we analysed some of those events in more detail.Figure 4 shows this analysis for the event labelled as A in the top left panel of Figure 3.The top panel in Figure 4 shows the time series of observed precipitation, temperature and flow for this event.It can be noticed that the flow peak observed around day 500 has no clear explanation from the input forcing data: There is no precipitation prior to the event, neither a significant temperature increase that could produce a large amount of snowmelt (notice that a similar increase in temperature occurring some days before did not produce any increase in flow).This event might thus be an example of disinformative data.Time-varying SA (TVSA) attributes a key role to uncertainty in flow observations because if those observations were lower, then the model performance could be significantly higher, and vice versa.This is exemplified in the bottom panels in Figure 4.The left one reports, as an example, two sampled time series of flow perturbations used in our TVSA.The dashed line generates a perturbed time series where flow observations are increased; the continuous line generates a perturbed time series were flow observations are reduced.The bottom right panel in Figure 4 shows the conditional CDFs of the RMSE for day 500 when these two flow time series are used (while varying all other sources of uncertainty).The red line in this figure is the unconditional CDF, which is obtained by varying all sources of uncertainty including flow observations.The figure shows that when reducing flow observations (continuous line), the CDF is shifted towards the left, i.e. lower values of RMSE become more frequent.This means that, regardless of the variations in the other input factors (parameters and forcing inputs), the model would be likely to perform better if flow observations were lower.The opposite would happen if flow observations were higher.Uncertainty in flow observations plays a key role with respect to other sources of uncertainty in that event, and this is why its sensitivity index is high.This is an example of how TVSA could be used to determine disinformative data.

French Broad River
TVSA results for the French Broad River are given in the middle panel of Figure 3.They show that in this catchment, uncertainty in precipitation data has much larger influence than in the English River and is as influential as parameter uncertainty during some high flow events.Among different groups of parameters, routing parameters are more influential than soil ones.A possible reason for this is that this catchment is very wet, and therefore, soil parameters matter less.Furthermore, there are many small events where timing errors might relate more to the routing than to the runoff production (soil) parametersespecially compared with drier catchments.
Results for the snow parameters deserve a further comment.Because this catchment is not affected by snow accumulation and melt, the relevant module is actually switched off here.This is obtained by fixing the snow parameter TS to À ∞ and all other snow parameters (CFMAX, CFR and CWH) to zero, so that all precipitation is turned into rainfall and immediately diverted to the soil moisture accounting module.Hence, when performing TVSA, the model is evaluated against the same combination of snow parameters ([À∞, 0, 0, 0]).In principle, we should therefore obtain a zero-valued sensitivity index for the snow parameter group.In practice, this does not happen because the sensitivity index of Equation 4 is approximated using empirical CDFs, and the empirical CDFs of two different samples can differ (by a small amount) even if the underlying probability distribution is the same.Indeed, in both the time-varying and aggregate cases, we obtain very low but non-zero sensitivity values.Although physically meaningless, these values are interesting because they give us a reference to evaluate the accuracy of the sensitivity indices of the other input factors.For instance, in the middle right panel of Figure 3, we see that the sensitivity index of flow data uncertainty is of the same order as that of the snow parameters, which means that the measured sensitivity to flow data uncertainty is within the range of numerical approximation errors, and might thus be regarded as negligible.

Guadalupe River
Finally, the bottom panels of Figure 3 show the TVSA results for the Guadalupe River near Spring Branch, Texas.Similarly to the French Broad River, this catchment is not affected by snowmelt, and the sensitivity estimates for the snow parameters are reported only as a reference to infer the approximation accuracy of the other indices.Here, uncertainty in soil parameters is by far the most influential source of uncertainty, which is consistent with the fact that the catchment is very dry, and therefore, the soil dynamics, which control the separation of precipitation into runoff and evaporation, dominates over routing for the lumped model we are analysing here.
Similarly to the English River, uncertainty in precipitation data has a limited influence in relation to parameter uncertainty.As for uncertainty in flow observations, TVSA reveals a very high sensitivity in one specific time period, i.e. the beginning of the third year of the simulation period (see letter B in Figure 3).We therefore analysed the third year of the simulation period in more detail.Figure 5 depicts the simulated and observed flows in that period and exhibits two very different behaviours in dry conditions.At the beginning of the year (days 800-850), the observed hydrograph (red line) flattens at a value of about 0.05 mm/day, while at the end of the year (days 1050 onwards), it goes to zero.In both periods, precipitation events (shown at the top of the panel) are equally infrequent and low.This evidence suggests an inconsistency in the data, with days 800-850 suggesting that the catchment can sustain a flow of about 0.05 mm/ day even after a prolonged dry period and days 1050-1096 suggesting that the flow goes to zero.This is why TVSA shows a higher sensitivity to flow observations around days 800-850: Just as for the English River, the most effective way to improve RMSE in this period is by perturbing (i.e.decreasing) observed flows.
The second influential factor in the dry period is soil parameters.To further investigate the time-varying relationship between soil parameters and model performance, we applied the DYNIA (Wagener et al. (2003) to the available sample of 3000 independent Monte Carlo simulations (which we generated to build the unconditional CDFs for PAWN).As an example, the left panel in Figure 6 shows the DYNIA results for the exponential parameter BETA.Here, the colour scale represents the frequency distribution of BETA over the subsample of the top 5% performing simulations.It can be noticed that in the dry period around days 800-850, higher performances are more frequently obtained with low values of BETA (around 1.96), while in the dry period just after day 1050, they are obtained with high values of BETA (around 5.55).The reason is that increasing BETA reduces the runoff from the soil moisture accounting component (see right panel in Figure 6), thus allowing for increased evaporation and reduced flow, while reducing BETA increases the runoff and therefore the flow.In other words, DYNIA reveals how parameter BETA can be varied to (almost) close the water balance and compensate for inconsistencies in data.
This example shows how we can use the combination of TVSA, output visualization and DYNIA to understand model shortcomings and potentially disinformative data periods.TVSA highlights an unexpected period of parameter sensitivity, while the streamflow plot shows that the model is not capable of encapsulating the observed flow.Time-varying parameter analysis then further suggests how the model is trying to reach the observed flow by decreasing parameter BETA.Given that the model cannot reach the observations, it is likely that data error is to blame, rather than model structural shortcomings.

Impact of the moving window size
The last analysis we performed was aimed at evaluating the impact of the chosen window size.In fact, the limited influence of precipitation data uncertainty in the English and Guadalupe catchments might be attributed to the fact that performances are averaged over a relatively large moving window (31 days), while precipitation data uncertainty might influence the model accuracy on shorter timescales.To verify this explanation, we repeated our TVSA using a moving window of smaller size.We tried different sizes down to a minimum of 3 days (w = 1).This analysis showed that changing the window size does not significantly impact sensitivity to precipitation errors.As an example, Figure 7 reports the sensitivity results for the extreme case w = 1 (intermediate results with 1 ≤ w ≤ 15 are qualitatively similar).Shortening the moving window increases the sensitivity to precipitation uncertainty, but the increase is rather small.We can thus conclude that precipitation uncertainty has limited impact on the model performance in the English and Guadalupe catchments regardless of the considered timescale.However, as discussed earlier, the uncertainty here investigated only deals with the intensity of precipitation data and not their temporal distribution.
Figure 7 also shows a main difference with respect to the previous sensitivity results, that is, the loss in sensitivity to the snow component parameters in the English catchment (compare the row labelled as snow in the top panels of Figures 3 and 7).The reason is that the snow accumulation and melt process is relatively slow, and therefore, its impact can be more clearly detected over a 31-day simulation period than a 3-day period.These results confirm that the choice of the window size can significantly impact sensitivity estimates, as also demonstrated in previous studies (e.g.Massmann et al., 2014).While we cannot suggest a formal, objective way to define the window size a priori, we advise to check the impact of this choice by repeating the analysis for different window sizes.Here, the particular hydrological meaning of each parameter is important to consider, e.g.parameters defining storage sizes require larger window sizes than quick residence times.If large differences are detected, either they can be given a physically meaningful interpretation or they should be further investigated as they could help in identifying conceptual weaknesses in the model and/or in the GSA set-up.Notice that such posterior analysis is not computationally expensive because calculating the time-varying performance metric for a different window size and the associated sensitivity indices does not require re-running the simulation model.

LIMITATIONS OF OUR APPROACH AND FUTURE RESEARCH DIRECTIONS
This study demonstrates the use of TVSA to quantify the relative influence of different sources of uncertainty on the accuracy of a lumped hydrological model.While our results provide some interesting insights on how such influences vary in time and across catchments, it should be highlighted that they hold true under a number of assumptions and choices that we made in setting up the GSA.These choices include the hydrological model being used; the case study sites; several definitions of the experimental set-up of GSA, e.g. the simulation period, the definition of the performance metric and the selection and characterization of the uncertainty sources; and finally the choice of GSA method itself.In this section, we discuss some of these choices and their possible implications on our results and give directions for further research.
Our analysis is applied to the conceptual HBV model in the formulation presented in Kollat et al. (2012); hence, we cannot exclude that sensitivity estimates would be different if a different rainfall-runoff model was used.Herman et al. (2013a) compare TVSA for three different hydrological models and find significant inter-model differences.However, that study considers parameter uncertainty only and not the relative influence of parameter versus data uncertainty.Also, because in our approach we analyse groups of parameters related to the three model components (snow, soil, routing) rather than individual parameters, using different equations to represent individual hydrological processes in those components might have a smaller impact on group sensitivities.Another interesting direction for further research would be to regard the variability of model equations as an additional source of uncertainty (so-called model structure uncertainty) and expand our approach to assess the relative influence of structure uncertainty with respect to parameter and data uncertainty.
Another subjective choice is that of the performance metric.In this study, we use the RMSE, a metric that tends to be particularly responsive to how well the model reproduces the timing and shape of the hydrograph (e.g.Gupta et al., 2009 and references therein) and therefore is usually sensitive to the parameters of both the soil moisture accounting and the flow routing component.Other metrics might produce different sensitivities.Such differences are typically significant when using aggregate performance metrics (an example is given by Shin et al. (2013)); however, in the case of TVSA, our own experience is that this choice has a rather small impact unless the window size is very large (e.g.several months, see Wagener et al., 2003).
As for the GSA method, a range of options has been used for TVSA in the past, including the Fourier Amplitude Sensitivity Testing (FAST) (Reusser and Zehe, 2011), segional sensitivity analysis (Wagener et al., 2003;Sieber and Uhlenbrook, 2005), Sobol' (Kelleher et al., 2013) and PAWN in this study.Given that the results across these studies are quite consistent with each other, we believe that also this choice is less crucial (but for very large window sizes), although we have not yet thoroughly tested this perception thus far.
Finally, one aspect we know that does have a large impact on our results is the characterization of the uncertainty sources.For example, we already discussed how the error model used to generate equiprobable time series of precipitation, which does not allow for timing errors, might have reduced the sensitivity to precipitation uncertainty.We might also expect an increased (reduced) sensitivity to data uncertainty if we increased (reduced) the variability of data errors (here set to ±40% for precipitation and ±20% for potential evaporation and flow data).Similarly, the definition of parameter ranges might be very important.For example, Kelleher et al. (2013) found that it was possible to separate parameter influences in time only after substantially reducing the uncertainty range of one parameter (cross-sectional area), because, otherwise, the variability of that parameter dominated the model response by producing unreasonable model outputs.In our approach, such effect might be mitigated by the fact that we filter out output samples that do not satisfy a minimum performance target (see the discussion after Equation 3).Yet, understanding the influence of the chosen characterization for the uncertainty sources remains a crucial aspect in the interpretation and transferability of GSA results.

CONCLUSIONS
In this paper, we investigate the relative importance of parameter and data uncertainty on the performance of a spatially lumped conceptual rainfall-runoff model via TVSA.We find that TVSA can reveal information on local sensitivities that would be hidden in SA of aggregate performance and that TVSA could provide a formal method to identify periods where data might be disinformative due to observational errors.We also find that the relative importance of different factors changes across catchments with different characteristics.Routing parameters have higher influences in a wet catchment where the runoff coefficient is higher and quick recession Hydrol.Process. 30, 3991-4003 (2016) phase is longer, while soil parameters are the main source of uncertainty in a dry catchment, where estimating the amount of water lost to the atmosphere as evapotranspiration has a large influence.Uncertainty in precipitation data has a significant influence in a wet catchment, while its influence is much more limited in a snow affected and a dry catchment.It is important to stress that the transferability of our conclusions beyond our test catchments has yet to be tested.The results are further limited by the assumptions made in our study set-up as discussed earlier.
While addressing these limitations left for future research, this work demonstrates that (1) the relative importance of data and parameter uncertainty, both in time and across different places, can be formally investigated by TVSA; (2) TVSA is a generic methodology that can be tailored and applied to other case studies in a relatively straightforward way; and (3) TVSA provides interesting insights for model diagnostic, identification of disinformative data and prioritization of efforts for uncertainty reduction.

Figure 1 .
Figure 1.Schematic of the conceptual hydrological model used in this study.Model parameters are highlighted in blue.Their meaning is further explained in Table I Figure 2. Schematic of the sampling and model evaluation procedure

Figure 3 .
Figure 3. PAWN sensitivity indices (ranging from 0 to 1) of RMSE for the English River catchment (top panel), the French Broad River catchment (middle) and the Guadalupe River (bottom).Uncertainty sources considered by sensitivity analysis are precipitation data (rain), potential evapotranspiration data (evap), parameters of the snowmelt/accumulation component (snow), parameters of the soil moisture accounting component (soil), parameters of the flow routing component (route) and flow data (flow).Left panels: sensitivity indices of RMSE computed over a moving window of 31 days (for the sake of clarity, only the last 2 years of the simulation period are shown).Red line is the time series of observed flow.Right panels: sensitivity indices of RMSE computed over the entire simulation period

Figure 4 .
Figure 4. Top: observed precipitation (mm/day), temperature (C) and flow (mm/day) for the English River catchment around day 500.Bottom left: example of two time series of the perturbation e t applied to the flow observations around day 500.Bottom right: unconditional CDF of the RMSE for day 500 (red) and two conditional CDFs (black) obtained from the flow perturbation time series in the left panel.These are the CDFs used to compute the PAWN sensitivity index for flow uncertainty source (for the sake of clarity, only 2 out of the 20 conditional CDFs are displayed)

Figure 5 .
Figure 5. Observed (red) and simulated (grey) flow for the Guadalupe River in the third year of the simulation period.Black line shows the (reversed) precipitation data

Figure 6 .
Figure 6.Left: dynamic identifiability analysis (DYNIA) of the soil parameter BETA for the Guadalupe River in the third year of the simulation period.The colour scale here represents the frequency distribution of BETA in the subsample of the top 5% performing simulations.Black line is the time series of observed flow.Right: the effect of two different choices of parameter BETA on the runoff from the soil moisture accounting routine (underlying equation is Runoff = Prec (SM/FC) BETA where SM is the soil moisture content and FC is field capacity) 4001 UNDERSTANDING THE TIME-VARYING IMPORTANCE OF UNCERTAINTY SOURCES Copyright © 2016 The Authors.Hydrological Processes.Published by John Wiley & Sons Ltd.

Table I .
Parameters of the hydrological model and associated uncertainty ranges for the English, French and Guadalupe catchments.

Table II .
Catchment characteristics.
ables.They vary with location, with instrument, with time period of measurement, etc.A nice summary is presented byMcMillan et al.