Quantifying the Uncertainty of a Conceptual Herbicide Transport Model With Time‐Dependent, Stochastic Parameters

Small streams in catchments with agricultural land use are at high risk of diffuse pollution by herbicides. Fast transport processes can cause concentration peaks that exceed regulatory requirements. These processes have a high spatio‐temporal variability and data characterizing their occurrence is often sparse. For this reason, such systems show a stochastic behavior at the resolution we observe them (same input and initial conditions lead to different output). Realistic model representations should acknowledge this pronounced apparent intrinsic stochasticity. However, a deterministic description of the physical and chemical processes at the catchment scale is state of the art in research and practice. We explore the potential of stochastic process formulations in combination with the Bayesian learning paradigm to (a) improve the quantification of the uncertainty of conceptual catchment‐scale pesticide transport models and (b) gain new mechanistic insights about the system by interpreting the temporal evolution of the stochastic processes. This is done with the help of a framework for time‐varying stochastic parameters. Thereby, we find that (a) the stochastic process formulation can lead to a more realistic characterization of the uncertainty of internal states and model output compared to the deterministic one, and that (b) the temporal dynamics of parameters resulting from the inference can highlight model deficits (and inspire improvements) such as a better sustained baseflow in dry periods. We also identify two key challenges: numerical difficulties in sampling the posterior and the question of where to introduce and how to constrain the additional degrees of freedom such that they are not misused.

In light of the considerations on stochasticity and flexibility in the two preceding paragraphs, let us look at the fundamental distinction between deterministic process model (DPM) and stochastic process model (SPM). As DPMs, we classify models that rely on process descriptions that yield the same result when provided with the same input, parameters, and initial states. In such models, a stochastic error term is usually added to the output of the underlying purely DPM. SPMs, on the other hand, are more flexible and account for stochasticity in internal model states (as opposed to just the output), for example, through stochastic differential equations, stochastic parameters, or stochastic input. While the latter allows to account for input uncertainty, the former two are possible approaches to consider model structural uncertainty. SPMs are a promising technique to combine the need for increased flexibility with the interpretability provided by process-based (mechanistic) approaches. A more detailed definition of DPMs and SPMs can be found in Section 2.
A large part of the previous studies dealing with uncertainty quantification of WQ models at the catchment scale fall into the category of DPMs. For example, Zheng and Keller (2007) account for input, parameter, model structure, and observational uncertainty through an additive error model in combination with the GLUE approach (Beven & Binley, 1992). Bertuzzo et al. (2013) quantify parametric uncertainty of a conceptual WQ model to quantify herbicide concentration dynamics in a small catchment. Bayesian approaches have been identified as especially a promising technique in dealing with uncertainty of WQ models (Rode et al., 2010). Previous studies have proven the value of Bayesian approaches to quantify uncertainties of WQ models (e.g., Gardner et al., 2011;Hantush & Chaudhary, 2014;Hong et al., 2005;Raat et al., 2004;Talamba et al., 2010;Wellen et al., 2014). Bayesian frameworks have also been developed (Schoups & Vrugt, 2010) or applied in WQ modeling studies . Han and Zheng (2018) present a Bayesian framework for jointly considering input, parametric and residual uncertainty based on a perturbation of input pollutant loading, acknowledging the large input uncertainties typically present in WQ modeling studies. While these developments have certainly advanced the assessment of uncertainty of WQ models, they fail to acknowledge the fundamental inherent stochasticity of the systems we deal with.
SPMs, on the other hand, have been rarely applied in WQ studies so far. Beck and Young (1976) illustrate the potential of stochastic, time-dependent parameter methods in identifying model structures of simple river WQ models, while state assimilation techniques have been used at much larger scales in river WQ modeling (e.g., Kim et al., 2014;Loos et al., 2020). SPMs are common in purely hydrological applications, where it has been argued that the assumption of deterministic behavior of a catchment is "indefensible" (Kuczera et al., 2006). The mentioned study explains in detail why this is the case and makes an argument for what we call SPMs. In hydrology, SPMs are mostly used in connection with a class of methods called "data assimilation." These methods aim at reconciling model states, parameters, or structures with observations to obtain more accurate estimates of model states and parameters including their uncertainty (Liu & Gupta, 2007). For an overview of the large number of previous studies on this topic, the reader is referred to Liu and Gupta (2007) and Liu et al. (2012). While many approaches estimate either states or parameters separately, it is generally recognized that a joint estimate is conceptually preferable (Liu & Gupta, 2007;Vrugt et al., 2005). Various approaches for the joint estimation of parameters and states have been achieved (e.g., Moradkhani, Hsu, et al., 2005;Vrugt et al., 2005); an interdisciplinary overview is provided by Kantas et al. (2015). However, many of the previous approaches are based on stochastic states and parameters, but fewer studies rely on the concept of time-varying stochastic parameters (e.g., Pathiraja et al., 2016b;Rajaram & Georgakakos, 1989;Reichert & Mieleitner, 2009;Smith et al., 2008). Even though the two approaches are technically very similar, the concept of time-varying parameters is more satisfying as it conserves mass balances and therefore respects fundamental physical principles. In this study, we rely on the approach for stochastic, time-dependent parameters suggested by Reichert and Mieleitner (2009) since it is fully compatible with the Bayesian learning paradigm by providing an approach to sample from the full joint posterior distribution of constant and time-dependent parameters. The approach taken here can be classified as a "smoothing" approach according to McLaughlin (2002), since the posterior of the quantity of interest at each time point is calculated conditional on all (past, present, and future) observations; it is therefore used offline in batch mode. Filtering approaches, which only condition on past and present obser-AMMANN ET AL.  (Yang et al., 2018).
The objectives of this study are the following: • Explore the potential of SPMs of conceptual nature for the characterization of herbicide transport in small headwater catchments. Investigate whether this approach can reveal mechanisms in the underlying system that have not been accounted for so far. • Check whether incorporating the identified patterns in time-dependent parameters into the deterministic part of the model structure improves the likelihood compared to the original model. • Assess the uncertainty estimates provided by the SPMs for quantities that are of interest in WQ problems, like high-frequency dynamic fluctuations of herbicide concentrations in a small catchment with agricultural land use.

Definition of Deterministic and Stochastic Process Models
For DPMs, the states, y (note the use of boldface for vectors), are given by the model, m, which is a function of the parameters θ: Note that we omit the input from the notation, since statistically speaking, it is not fundamentally different from the parameters; both are quantities that affect the states and the output of the model. By model output, we mean the model's estimate of the true value of the observed state (e.g., streamflow). Let us assume that the actual observations, y obs , are a specific realization of the random variable Y obs (note the use of capital letters for random variables), which is either given by where o(⋅) is a function mapping the model states to the outputs and E o is a stochastic term that describes the effect of all sources of errors (including measurement errors) on Y obs in a lumped way. Since errors are often heteroscedastic (e.g., Clarke, 1973), E o depends on y and on error model parameters, ψ. Alternatively, the transformation h(⋅) (e.g., Box-Cox) may be applied (Equation 2b), which ideally leads to a homoscedastic error term, E h , that does not depend on the states anymore. However, the error tends to have other complex and unfavorable characteristics such as nonnormality (e.g., Schoups & Vrugt, 2010), and autocorrelation (e.g., Aitken, 1973;Beven & Westerberg, 2011;Kavetski et al., 2003). Accounting for these characteristics can be difficult (e.g., Ammann et al., 2019;Evin et al., 2014;McInerney et al., 2017;Schoups & Vrugt, 2010).
For SPMs, on the other hand, the states, Y, are random variables obtained by the stochastic model M: and the observations are characterized as a deterministic or stochastic function, o(⋅), of the states: where Z, describes the residual stochasticity, which considers the remaining uncertainty that is neither accounted for in the process model, M, nor in the function, o, which maps the states to the output. Ideally, Z represents the measurement error only, which is the case if all other sources of uncertainty have been accounted for in M and o. Equation 4 has conceptual advantages over Equation 2. First, it is preferable to account for the errors where they actually originate: in the internal states and fluxes of the model. This is a more accurate description of many environmental systems, for which the uncertainty in the modeled states and fluxes is often larger than the measurement uncertainty of the output. Second, calibrated optimal temporal dynamics of internal states or parameters can be used to inspire model structural improvements AMMANN ET AL.

Methods
The procedure applied in this study is largely based on the approach for stochastic models presented in (Reichert & Mieleitner, 2009;Reichert et al., 2020). After a short introduction to the study site and the data (Section 3.1), we describe in detail how that approach is applied in this study by (a) introducing stochasticity in a previously deterministic model by making some of its parameters stochastic, time-dependent quantities (Section 3.2), (b) screening for patterns in the inferred temporal dynamics of time-dependent parameters (Section 3.3), (c) based on the identified patterns, formulating model improvements to eliminate systematic model deficits (Section 3.4), and (d) subsequent prediction for uncertainty quantification (Section 3.5). Finally, Section 3.6 provides some details of the numerical implementation of the model and the sampler.

Study Site and Data
The modeled study catchment is the Ossingen experimental catchment, located in north-eastern Switzerland. It has a size of approx. 1.2 km 2 with predominantly agricultural land use (75%, crop production) and a subdued topography with an average slope of 4.3° (Doppler et al., 2012). While the dominating soil type in the catchment is a well-drained cambisol, poorly drained gleysols can be found in the center close to the stream (FAL, 1997). The average precipitation is around 900 mm/yr (MeteoSwiss, 2016). The catchment is rather typical for the Swiss Plateau regarding the mentioned characteristics. In the growing period of 2009, Doppler et al. (2012) conducted a controlled application of various herbicides in the Ossingen catchment and measured precipitation (10 min resolution), streamflow (5 min resolution), in-stream herbicide concentrations (15 min resolution during elevated streamflow), and various additional relevant states like groundwater levels and concentration of herbicides in the top soil layer. A more detailed overview of the experiment and the catchment characteristics is given in the mentioned study.
The model input and observation data we use for calibration is a preprocessed subset of the data obtained by Doppler et al. (2012) and it is the same that was used by Ammann et al. (2020). The observations used for calibration extend from March 1 to October 12, 2009 and involve measurements of streamflow, in-stream concentrations of the two herbicides atrazine and terbuthylazine, and soil/water distribution coefficients of atrazine. High-frequency in-stream concentration measurements are available during five major precipitation events that differ considerably in volume, intensity, and dominant runoff processes that they cause (Doppler et al., 2012). The variability in the characteristics of the rainfall events is expected to increase the general validity of the results. The overall resolution of the preprocessed input data (precipitation, potential evapotranspiration) is 15 min. A more detailed description of the data can be found in Ammann et al. (2020).

Probabilistic Model Based on Stochastic, Time-Dependent Parameters
We base our analysis on the deterministic conceptual model of herbicide fate and transport that has been suggested based on experimentalist knowledge for the Ossingen catchment. The model is described in detail in Ammann et al. (2020), where it is denoted as "MexpH4." Let m(t, θ) be the model function, which defines how the model states, y, evolve in time depending on the parameters θ ( Table 1). The states consist of multiple reservoirs that represent fast and slow responding elements of the catchment in terms of water and herbicide transport. For example, the "groundwater" reservoir (representing an ensemble of slow transport processes) defined by the state S g y (for ease of notation we use X instead of y X in the following) is given by:

Water Resources Research
where Q g, in is the inflow to that reservoir,  g k and   g are part of θ, o(⋅) is the function that maps the states to the observations, and "… " denotes that this function depends on other states and parameters not shown here. Note that, because the model is conceptual in nature, its states and parameters are not directly measurable quantities but they are related to key states and processes in the catchment. For example, S g represents the water content of the part of the saturated zone that is connected to the stream. See Ammann et al. (2020) for a more detailed explanation of the meaning of the individual parameters and states of the model.
In order to turn m into a stochastic model that can be written in the form of Equations 3 and 4, we make use of a previously developed framework for time-dependent parameters (Reichert & Mieleitner, 2009) which turns (a subset of) model parameters that were previously constant in time into a time-dependent stochastic process, Θ s (t, ξ s ). This leads to where Y are the model states, Y o is the model output that is comparable to the observations, Y obs , is a random variable characterizing the observations, s is the index of the stochastic parameter, and θ −s = {θ i } for all i ≠ s. The probability distribution of Y o describes our knowledge about the system's states represented by the model output variables. This knowledge is uncertain due to incomplete (epistemic) knowledge of model parameters as well as due to stochastic (aleatory) system behavior. As we describe both of these types of uncertainty probabilistically, we can combine them in a Bayesian approach despite the fact that this includes uncertainty of frequentist nature (Reichert et al., 2014 Note. For each inference setting, one of them is turned into a time-dependent stochastic process and fitted jointly with all the others. Note that, to limit the computational effort and to avoid potentially excessive identifiability problems, only one parameter is time-dependent in each inference setting. For the stochastic process, Θ s (t, ξ s ) and its realization θ s (t), we use an Ornstein-Uhlenbeck (OU) processes (Uhlenbeck & Ornstein, 1930), which is defined by parameters ξ s = {μ s , σ s , τ s } (mean, asymptotic standard deviation, and characteristic correlation time, respectively), and where W(t) is a Wiener process (i.e., random walk in continuous time).
The residual error model, Z, is formulated for the different types of measurements (indexed by j) to which the models are calibrated; streamflow data, concentration measurements, and soil water distribution coefficients (see Ammann et al., 2020 for detailed information on the data). For the error of a measurement of type j at time t, we assume that where N is the normal distribution and σ Z,j (t) is its standard deviation: where ψ j = {a j , b j , c j } are the parameters of the residual error model for observations of type j and y 0,j is a scaling constant. Equation 10 is equivalent to the formulation used in Ammann et al. (2020) with the difference that we assume no temporal autocorrelation in Z(t), since it should mostly represent the random component of the measurement error. Note that we cannot specifically account for systematic measurement errors such as the ones caused by faulty rating curves.
In our case, the model states, Y, do not require a probabilistic formulation since they are a deterministic function of the parameters. Assuming independence of the priors of ξ s and ψ, the joint distribution of parameters and potentially observed states, y obs , is given by: where obs Y f is the likelihood conditional on the time-dependent and constant parameters, which arises from the assumed statistical properties of the measurement error, Z (Equation 9). obs Y f is based on the likelihood framework proposed by Ammann et al. (2019) and its details for the specific application to the Ossingen data are given in Ammann et al. (2020). f OU (⋅|ξ) is the probability density of the discretized OU-process given parameters ξ, and f pri is the prior distribution. The latter consists of independent distributions for most of the individual parameters and we use hard boundaries on all of the marginals (see Appendix A). Regarding the prior for ξ s , note that τ s is fixed for all s due to theoretical difficulties in estimating that parameter (Roberts & Stramer, 2001). Liu and Gupta (2007) rightly argue that temporal variations of the parameters should be slower than those of the states, which reduces the peril of producing output response to input solely by the temporal dynamics of the parameter and not the states. This is why we fix τ s at a relatively large value of ca. 10 days. μ s and σ s are inferred. Out of ψ, a is inferred and the other elements are fixed (see Appendix A).
According to the Bayesian inference paradigm, we learn about the parameters by conditioning Equation 11 on the variable y obs and by substituting it with actual observations: where the normalizing factor f y (y obs ) is constant w.r.t. the parameters. The procedure used to sample from f post is described in Section 3.6.
AMMANN ET AL.

Water Resources Research
In order to keep the time-dependent parameters in reasonable ranges, and for reasons of numerical efficiency of the sampler, we redefine the time-dependent parameter as: where u s (⋅) is a scaled and shifted logistic function for s = D, and equal to the exponential function otherwise,  s is a time-constant parameter and  * ( ) s t is a dimensionless time-dependent parameter. The logistic function is used to asymptotically reach the relatively narrow bounds within which the prior for θ D is nonzero (see Table A1). Through Equation 13, we hope to gain numerical efficiency, since the time-constant  s becomes part of θ −s , which allows for θ s (t) to be additionally updated in the Metropolis sampling step in Equation 17, as opposed to only in Equation 19. The mean of  * ( ) s t is kept fixed at 0 for the inference.

Finding Patterns With Local Regression and Cross-Correlation
Once the posterior distribution of θ s (t) has been obtained for each parameter s, potential temporal patterns in θ s (t) are of interest, since they could indicate possibilities to improve our deterministic model. In other words, such patterns would indicate that the introduced stochasticity has been misused to compensate for systematic model deficiencies. Such deficiencies would need to be corrected before a meaningful interpretation of the predictive uncertainty can be made. For example, if the evaporation multiplication factor was found to correlate with temperature, the parameterization of the potential evaporation would have to be re-evaluated.
where g s is a function of external influence factors, x, and of a vector y g that contains only a subset of all the states y (see Table 2). ϵ i are random deviations. If g s can approximate * s θ well, we have a chance of improving our model m by (i) using the empirical model g s as is (i.a) or in a parameterized form (i.b) to replace some constant model parameters by their estimated temporal dynamics, or by (ii) reasoning about the deficits in physical process representation that cause the pattern g s and adapt the process representation of m accordingly. In this paper, we choose approach (i.a). An example for (ii) would be the extension of the model structure by adding a new reservoir after identifying deficits in the constitutive function of an existing reservoir.
For the function g s , we choose a weighted local regression model according to Cleveland et al. (1992) under the assumption of independent Gaussian deviations, ϵ i , with a polynomial of order two, and a smoothing parameter equal to unity. As arguments of the function g s , we use the features in Table 2 individually and in all possible combinations of two. We refrain from combining three or more features due to the limited benefit of going from one to two features (see Section 4.1). Weights, w i , are introduced for the least squares regression to account for the fact that the identifiability of the parameters (and thus the information content in the posterior) varies over time. The weights are given by standard deviation of the posterior distribution of the dimensionless time-dependent parameter  * s at time point t i (visible e.g., in Figure 1) and σ min is a minimal standard deviation introduced to prevent overly large weights of time points where the parameter is at the boundary. For the purposes of this study, we set σ min equal to 5% of the mean of the posterior of σ s . The weighted R 2 ,    In panel (k), the soil/water distribution coefficient, K d , is based on time-dependent  S t, z1 and constant  S t, z2 , while it is vice versa for panel (l). The 90% credible interval inferred with all parameters being constant (i.e., with the deterministic model m(θ)) are given by the horizontal lines for reference.

Water Resources Research
In addition, we scan for potentially time-shifted dependencies between  * ( ) s t and {x(t), y g (t)}, by calculating for each element k of the external influence factors and the states, where corr(⋅, ⋅) denotes the correlation and t m is 2 weeks.

Model Improvement Based on Patterns Found in Time-Dependent Parameters
The deterministic empirical relationships between ˆs θ and the influence factors, x, and model states, y g , are used to improve the predictions made by model m by constructing a "hybrid" model: where g s is the optimal empirical model (identified as described in Section 3.3) approximating the temporal dynamics of  * ( ) s t , and y g, hyb consists of a subset of the states, y hyb , of this model. As an alternative and more flexible approach, one could choose another function that is based on g s but has additional parameters, which could then be fitted together with θ −s .
The posterior distribution of θ −s of the hybrid model is then obtained according to the procedure in Section 3.6. Note, however, that the time-dependent parameters of the hybrid model,   * ( ) s t , are not stochastic, but given by   y and therefore need not be inferred (the sampling steps in Equations 18 and 19 are not necessary). Remember that the multiplication factor,  s , is part of θ −s , to still allow for an overall adjustment of the level of the parameter with index s. During inference, y g, hyb might reach values that were not reached in connection with ˆ( ) s t (based on which the function g s was fitted). In that case, y g, hyb is truncated so that it agrees with the support of g s .

Prediction
The prerequisites for a meaningful interpretation of the uncertainties estimated with time-dependent parameters is that the stochasticity provided by  * ( ) s t is not misused to compensate for systematic deficits of the model. In addition to the procedure described in Sections 3.3 and 3.4, this should be verified by pseudo-/ predictive cross validation, which is described in the following paragraphs.
The estimated uncertainty of the state variables of interest are obtained by propagating the joint posterior distribution obtained for θ −s , ξ, and ψ though the model. In the following, we distinguish between "intrinsic" and "residual" stochasticity. The former means all the uncertainty that is accounted for within the process model (i.e., through the constant parameters θ −s and the time-dependent parameters Θ s (ξ) in Equation 7a), including all the uncertainty this causes in the internal states and the output. Residual uncertainty, on the other hand, is characterized by the term Z(ψ) (Equation 7c), which includes the effect of all additional sources of uncertainty not accounted for within the process model, such as measurement uncertainty of the output or some remaining structural uncertainty. For visualization, we estimate intrinsic stochasticity on one hand, and combined intrinsic and residual uncertainty on the other hand by propagating the posterior distributions of (θ −s , ξ) and (θ −s , ξ, ψ), respectively, through the model given in Equation 7.
Note that in this study, due to the uniqueness of the data set and the limited period of high-frequency concentration measurements, the prediction period is the same as the calibration period (see Section 3.1), leading to the term "pseudo-prediction," which is explained in more detail below. This prevents us from quantifying the prediction error to be expected for an independent time period. Nonetheless, the approach is suitable to avoid overfitting and to confirm that the additional degrees of freedom are not used to reproduce variability that should be produced by the deterministic part of the model. When "pseudo-predicting," we neglect the posterior knowledge about the temporal dynamics of  * ( ) s t (even though it would be available in our pseudo-prediction period) to mimic prediction during a time period that is independent of the AMMANN ET AL.
10.1029/2020WR028311 9 of 27 calibration period. Therefore, like in a true prediction setting, the  * ( ) s t used for the pseudo-prediction are random realizations of an OU-process with parameters ξ taken from the posterior sample. This means that the many degrees of freedom introduced through the time-dependent parameters, which constitute the main part of the danger of overfitting, are not used for prediction. Furthermore, we focus on the characterization of the resulting predictive uncertainty and its division into intrinsic and residual uncertainty, which is independent of whether we predict during the calibration or a validation period.

Numerical Implementation of Model and Sampler
The original model, m(t, θ), as well as the adapted model, m(t, Θ s (t)), which accepts time series of parameters, are implemented with the SUPERFLEX framework for conceptual hydrological models (Fenicia et al., 2011). In SUPERFLEX, the evolution of the internal states are governed by differential equations   ( , , ) f t y y θ and the states, y(t i ), are obtained based on y(t i−1 ) with the implicit Euler method on the grid t i = t 0 + iΔ t with a fixed step size Δ t of 15 min in our case. In the modified version with time-dependent parameters, θ s (t i ) is assumed to be constant from t i−1 to t i . The output for the observation time points, The hybrid model is implemented as an iterative procedure where the k-th step consists of evaluating x y for all time points. The convergence criterion was chosen to be the maximal (over all time points) relative difference in y g, hyb from one iteration to the next of below 1%.
We sample from f post (Equation 12) according to the procedure developed by Buser (2003) and Tomassini et al. (2009). The ideas underlying this approach are described in detail in Reichert and Mieleitner (2009) and Reichert et al. (2020). The approach is based on Gibbs sampling across constant model parameters, parameters of the OU process that governs the time-evolution of the stochastic parameter, and the timecourse of the stochastic, time-dependent parameter. In the kth step of this Gibbs sampling approach, we sample , Sampling of the constant parameters,  ( , ) k k s θ ψ and k s ξ , within this Gibbs sampling scheme is done by Metropolis sampling with automatic adaptation of the proposal distribution during an initializing phase that is discarded as part of the burn-in phase. The stochastic parameter,  *, ( ) k s t , is proposed by sampling from the OU-process on sub-intervals (i.e., arbitrary sections of the whole simulation period) conditional on the values at the endpoints of that interval and accepting or rejecting based on the observation likelihood. By adapting  * ( ) s t only within sub-intervals, we reach a higher acceptance rate, since the new proposal for  *, ( ) k s t is largely similar to   *, 1 ( ) k s t and only differs within one sub-interval. This leads to faster convergence. On the other hand, one has to be aware that the split into sub-intervals increases the number of model runs required to complete one step of the outer Gibbs sampling procedure. In the present application, for most parameters that we made stochastic, we chose 50 sub-intervals, which results in a reasonably high acceptance rate for most time-dependent parameters. However, for some parameters that led to a relatively low acceptance rate during inference, we chose 100 sub-intervals. Convergence is assessed visually based on the Markov chains for the constant parameters, some selected time-points of the time-dependent parameters, the observational likelihood, obs

Water Resources Research
We use the R Language for Statistical Computing (R Core Team, 2019) for all the computations and visualizations, except for the hydrological process model, m, which is written in Fortran. For the sampling procedure, we use the R-package "timedeppar" .

Results and Discussion
Convergence was achieved in most inference cases with time-dependent parameters and in the case of all parameters being constant in time (see supporting information). Convergence problems were observed for the time-dependent  t,max S and  u,max S (the results of which are not shown), and for the infiltration excess parameter  ex P . The latter parameter is expected to be difficult to infer due to its threshold-like behavior and the fact that its effect on the output is restricted to very limited time periods. Inference of these parameters was, however, successful in the cases in which they were part of the constant parameters estimated jointly with other time-dependent parameters. No grave convergence problems were encountered for the other cases of time-dependent parameters and when inferring all parameters as constant in time.

Inferred Dynamics of Time-Dependent Parameters and Identified Patterns
For each inference approach with time-dependent parameter s, we obtain a posterior uncertainty estimate of    s s s t , ( ), * and . The resulting temporal dynamics of θ s (t) (obtained from  * ( ) s t acc. to Equation 13) show distinctly different characteristics for the different time-dependent parameters s (Figure 1). While some parameters show rather high-frequency fluctuations (Figure 1; e.g., panels a, c, n), others vary on longer timescales (Figure 1; e.g., panels g, h, k, l). They also differ in the relative width of their distribution at each time point compared to the amplitude of the temporal fluctuations. For example, panels (i) and (k) in Figure 1 show parameters with a wide uncertainty band relative to the amplitude of their temporal dynamics. Also, different parameters show different relative identifiability (indicated by the width of their distribution) during different time periods. This is especially pronounced for the parameter θ D (Figure 1a), which controls the split of the fluxes into a fast and a slow response; this parameter is only identifiable during precipitation events and reverts to the prior otherwise. Similarly, the release rate  i k of the reservoir representing the impervious areas shows a smaller uncertainty (and also lower values) during three distinct time periods (Figure 1d). Most of the parameters result in similar ranges of the marginal posteriors when time-dependent compared to the case of all constant parameters (Figure 1), which means that the interpretation regarding process representation of the individual parameters is consistent for the case of constant and time-dependent parameters. Exceptions are  c k ,  d k , and   u . For the first two, the reason for this is that  s goes to the lower boundary of the prior, which is the same as for the corresponding parameter in the case of all-constant parameters. Lower values for θ s (t) are then reached by shifting  * ( ) s t to values predominantly below 0.
For the case of s = k g , Figure 2 shows the corresponding uncertainties in model output (streamflow and atrazine concentration), internal states (level of the groundwater reservoir, S g ) and the temporal dynamics of  g ( ) k t in the calibration setting. Of the two parameters ( g k and   g ) that lead to the largest increase in the likelihood (see Figure 3a), The inferred temporal evolution of  g k (Figure 2d) has very similar characteristics as the one of   g (see supporting information). In particular, both time series show a reasonable amplitude of the fluctuations. This is opposed to most other parameters controlling the hydrological processes (see Figure 1), for which the variability (in terms of σ s ) is larger than expected to a degree that must be seen as unreasonable. Note that the two parameters without exaggerated variability are the ones that cause the largest increase in likelihood when being made time-dependent (Figure 3a). This indicates that, in case of structural model deficits, additional degrees of freedom introduced w.r.t. θ s can be misused (resulting in a large temporal variability of θ s ) to compensate for deficits unrelated to θ s .
A comparison of the resulting dynamics of the time-dependent parameters when performing inference on streamflow data only, as opposed to the full data set, can reveal further interesting insights w.r.t. the sensitivity of the parameters to the different data types at different points in time. For example, Figure C3 in the appendix shows that the parameter for the shortcut reservoir,  c k , is mainly informed by streamflow data, except during the major precipitation event in the end of May, which was the first one after spraying AMMANN ET AL.

Water Resources Research
of herbicides. This shows that the model relies on this fast shortcut reservoir to produce the observed concentration peak. The equivalent comparison for all other parameters is given in the supporting information.
When approximating   s t * ( ) as a local regression of the features {x(t), y g (t)}, we obtain values of weighted R 2 between close to 0 and 0.61 for the one-dimensional local regression ( Figure 3b) and up to 0.72 for the two-dimensional local regression ( Figure C2). A relatively large R 2 is found for the one-dimensional regression (0.6) for  * g k as a function of the level of the groundwater reservoir, S g (Figure 4a). The identified relationship is "banana-shaped," with higher values of the parameters coinciding with the lowest groundwater levels. The pattern for   * g is similar (Figure 4b). There also seems to be a tendency for  g k and   g to increase again to very high groundwater levels (Figures 4a and 4b), however, this is only supported by a limited amount of data (one event only). The temporal patterns of  g k remain largely unchanged in preliminary trials where it is inferred jointly with other time-dependent parameters. This increases confidence in the generality of the results described above. A closer look at the cause of the pattern in Figures 4a and 4b reveals that, in particular, April and August 2009 were very dry months, which caused the groundwater reservoir to fall below 10 mm (Figure 2).  g k and   g increase during that period to produce the relatively large baseflow still observed in the stream. This suggests that the catchment's baseflow is less affected by dry periods than can be represented with the static parameterization of a single nonlinear reservoir chosen in the model.
In order to allow for more sustained baseflow in dry periods, one could think of several model improvements, for example, an additional (slower) reservoir or an adaptation of the constitutive function of the existing groundwater reservoir. An advantage of the first option, is that it is effective also in time periods during which the existing reservoir is empty, while an advantage of the second option is the possibility of AMMANN ET AL.  } (x-axis) resulted in which R 2 for which time-dependent parameter (y-axis). The y-axis is ordered according to the maximal R 2 achieved with the one-dimensional local regression, the x-axis is ordered according to which feature reaches the highest average R 2 for all parameters.

(a) (b)
keeping the number of additionally introduced parameters low (or even equal to zero). We refrain from introducing an additional slower reservoir, which would require at least two new parameters (a split and a release rate), since based on visual assessment, the baseflow is already reasonably well fitted even with the DPM with constant parameters (see Figure C1 in the appendix). Instead, we choose to adapt the constitutive relation of the existing groundwater reservoir as described in Section 3.4.
An elevated value of R 2 is also obtained for the parameter  i k , which controls the response of the impervious areas. Figure 4c shows a surprisingly clear-cut threshold at S i ≈ 8 mm below and above which we tend to get small and large values of  i k , respectively. Note that the limited flexibility of the regression function does not allow for a good approximation of this pattern. The pattern is interpreted as an activation of additional flow paths with a slower response time (compared to the very fast responding impervious areas) in cases where the reservoir reaches a sufficiently high level. For example, these might be paths including compacted gravel roads, which are abundant in the study catchment, and which tend to react slightly slower than completely sealed surfaces.
The other strong correlations visible in Figure 3b are judged to be of less interest, since the pattern for the pair (k d , S g ) is dominated by a single event, (S t, z2 , S u ) and (r s , S d ) are judged to be coincidental, and (k c , S g ) is similar to (k i , S i ). As an example for a potentially coincidental pattern, we discuss the pair (S t, z2 , S u ) in more detail in Appendix B.
AMMANN ET AL.

Water Resources Research
Regarding the cross-correlation analysis, we find correlations of around ρ = 0.6 or weaker for all the combinations of  * s and x (k) or ( ) g k y . Many of the strongest correlations have already been discussed above or do not have an obvious interpretation that is physically meaningful. One potential exception is the exponent governing the nonlinearity of the response of the unsaturated reservoir,   * u , which is negatively correlated (ρ = −0.6) to the water content of the reservoir representing the unsaturated zone, S u , 10 days earlier. However, this finding is more an example of how an expert's reasoning can be stimulated and does not prove a causal relation between the two quantities. For that, we are missing longer time series of observations, which would enable a more robust analysis.
In summary, this analysis reveals that there are meaningful patterns in the inferred temporal dynamics of the parameters that allow for interpretation by the domain expert, which can inspire potential model improvements. This confirms the findings of previous hydrological modeling studies, which have found patterns in output innovations when estimating states (Vrugt et al., 2005) or patterns in the temporal dynamics of time-dependent parameters (Pathiraja et al., 2016a), and extends them to a WQ modeling case study. However, not all the identified patterns might be physically meaningful; a potential example is the relation between  t,z2 S and S u . This shows that such patterns need to be interpreted with great caution, especially if they are based on limited data that does not allow for an independent temporal validation.

Model Improvement by Incorporating Found Patterns
The patterns shown in Section 4.1 that are most promising for improving the DPM are the ones identified for  g k and   g . This is due to their relatively large effect on the observational likelihood ( Figure 3a) and due to the rather good interpretability of the mechanisms that cause the identified patterns. Table 3 shows an expected result; when estimating the dynamics of the time-dependent parameters through regression (with the hybrid model , column 3) the observational likelihood is higher than the one reached with time-constant parameters (with base model m(θ), column 1), but lower than the one reached when estimating the dynamics directly from the data (with the stochastic model m(θ −s , Θ s (t)), column 4). This is true for both, s = k g and s = α g . In both of these cases, however, the hybrid model is closer to the deterministic than to the stochastic model, which indicates that exploiting the information contained in the temporal variation of the time-dependent parameters leads to a rather limited model improvement in this case. This means that most of the variation (in terms of likelihood improvement) of  * ( ) s t is probably "true" stochasticity and only a minor part is caused by the systematic deficit revealed in Figures 4a and 4b. The predictive uncertainties (Section 4.3) obtained based on the inferred σ s for those parameters are therefore valid. For an example where the revealed deficits and the inspired improvements are more significant, see Reichert et al. (2020).
Note that we opted for a static function, g s , to improve the parameterization of the reservoir representing groundwater. More flexibility could have been traded for less accuracy by choosing a more general adaptation of the constitutive relationship of the reservoir, thereby introducing additional parameters. This might have led to a larger (or a smaller) increase in obs   (28) Note. The result for the original model with no time-dependent parameters is also given for reference.

Estimated Uncertainties Based on Bayesian Posterior Distributions
When using the DPM, the predictive uncertainty for streamflow and herbicide concentrations is dominated by the residual stochasticity, which is much larger than the parametric uncertainty ( Figure 5). The combination of the two yields error bands of reasonable width for the type of observations, j, for which we formulated an error model according to Equation 10. The unobserved states, for which one did not formulate a residual error model, for example, S g , have only parametric uncertainty, which seems too small, especially compared to the uncertainty of observed states like streamflow ( Figure 5). Furthermore, the stochastic realizations of model output for an observed state, say streamflow, have to be produced by sampling the residual stochasticity, since it combines multiple relevant sources of uncertainty. However, this leads to model realizations with unreasonably weak temporal short-range correlation compared to the observations (see the first and second row in Figure 5).
Regarding the predictions with the SPMs, we focus on the one with time-dependent parameter  g k (Figure 6), since it led to the largest increase in the observational likelihood. The results regarding predictive uncertainty that we obtain for    g s (see supporting information) are similar due to the similar structural function of the parameters. The predictive uncertainties obtained for all other parameters made time-dependent are given in the supporting information. Figure 6 shows that the total output uncertainty is dominated by the intrinsic stochasticity whereas the residual stochasticity is rather small. This is opposed to the dominant role of the residual stochasticity for the deterministic model ( Figure 5). Internal model states, such as the level of the groundwater reservoir, show a larger (probably more realistic) uncertainty in the case of the stochastic model ( Figure 6)   deterministic model ( Figure 5). This is of practical relevance whenever we are interested in such internal states. The uncertainty that characterizes our knowledge of states, output, and observations is much smaller during calibration (Figure 2) than during prediction ( Figure 6) when relying on the SPM, which is a very intuitive result. This is not the case for the DPM. Furthermore, with the stochastic model, we can easily obtain realizations of the states that have both, a reasonable autocorrelation and a reasonably large uncertainty (green dashed line, Figure 6). With DPMs, a similar result would require considering autocorrelation in the residual error term, which has been shown to be problematic (e.g., Evin et al., 2013), especially for high-frequency data (e.g., Ammann et al., 2019) like the one used in this study. Note that another study successfully considers autocorrelation in the residual error term and finds stronger model deficits .

Challenges, Recommendations, and Transferability
One major issue we encountered, which we believe to be general, is related to the additional degrees of freedom introduced by making different parameters stochastic. The freedom w.r.t. some stochastic parameters led to exaggerated temporal variability in these parameters, which was misused to reduce errors originating from other sources. In particular, parameters to which the likelihood was rather insensitive suffered from this deficiency, presumably because large changes in these parameters are necessary to achieve a higher likelihood. This shows that failing to constrain the stochastic parameters through an adequate prior for their asymptotic standard deviation, σ s , which is difficult in our experience, can lead to unreasonable behavior of the SPM. We find that an effective solution to this is to make the parameter that has the largest impact on the likelihood time-dependent, which is also conceptually reasonable, since in that case, the degrees of freedom are introduced where they are most needed. A systematic procedure for selecting which parameters to AMMANN ET AL.  make time-dependent is still to be developed in future studies. Identifiability issues are expected to increase with more time-dependent parameters. A preliminary analysis in which we infer two time-dependent parameters jointly (see supporting information) shows that the two fast reservoirs S i and S c are less misused as slow reservoirs when their release rates are fitted jointly with  g k of the slow reservoir as compared to when they are fitted as single time-dependent parameters. This is an indication of potential benefits of inferring multiple time-dependent parameters, given that the most sensitive parameters are included.
Sampling from the posterior distribution was numerically challenging; the chosen sampler required up to 100 evaluations of the DPM for one iteration of the Markov chain in order to achieve a reasonable acceptance rate. This resulted in a rate of ca. 2,500 samples per week on a modern processor. This sampler is therefore only applicable if at least one of the following conditions is met: (a) the model is cheap to run, (b) a limited amount of information is contained in the inferred time series (i.e., a short length or a small signal-to-noise ratio), or (c) only part of the simulation period has to be re-evaluated if the value of the time-dependent parameters have changed only on a short sub-interval.
When choosing the parameter(s) to be made time-dependent, we recommend to focus on the ones to which the likelihood function is most sensitive. The degree of flexibility of the empirical model, g s , on which the screening is based, should be reasonable in light of the amount of data; a g s that is too simple will conceal interesting dependencies, an overly complex g s will find many coincidental relationships that lack physical meaning. Since it is not a priori clear what the optimal complexity is, such patterns need to be interpreted with great caution. Section 3.3 gives an example of how to scan the potentially large amount of results for patterns (visualized in Figures 3 and 4), which are expected to work well for a large class of problems. Section 3.4 presents a generally applicable empirical improvement of the original model. It might be preferable to design more process-based changes, but these are much more dependent on the case study and should be formulated by the respective domain expert.
Since we explore a single case study, we can only make limited statements about the general applicability of the stochastic parameter approach to other problems or fields. On the other hand, the study involves a coupled hydrological and WQ model and joint inference to three types of data with vastly different number of observations each. This proves the transferability of the approach from purely hydrological applications (Reichert & Mieleitner, 2009;Reichert et al., 2020) to WQ models, which increases the confidence in further transferability to other models, case studies, or domains. In WQ modeling, approaches like this are especially needed due to the large uncertainties and the lack of data, which often precludes other flexible approaches that are more data-hungry, like machine-learning techniques.

Conclusions
This study has shown that SPMs based on time-dependent, stochastic parameters can be used at the catchment scale to assess dynamic herbicide pollution from diffuse sources on short timescales. This is one step in a potentially much broader future application of similar techniques in WQ modeling, a field in which SPMs have not received sufficient attention so far.
The analysis with time-dependent parameters revealed that the parameters controlling the hydrological processes lead to a larger increase in the observational likelihood (i.e., goodness of fit) when being made stochastic than the parameters controlling the chemical processes. Based on the temporal dynamics of the inferred parameter time series we conclude that the release rate of the slow reservoir seems to be slightly underestimated during dry conditions with the original deterministic model. After verifying that the predictive uncertainty is reasonable and after accounting for the identified deficit by adapting the DPM and repeating inference, we find that the increase in model performance in terms of observation likelihood is rather limited. Therefore we conclude that the identified deficit of the original model is not severe, which is a prerequisite for a save interpretation of the uncertainty estimates obtained with the chosen approach.
The resulting temporal dynamics of the time-dependent parameters were shown to contain information that is interpretable by the modeler or the domain expert. The spread of the posterior distribution of a parameter's time-series provides additional information about the periods during which the observed model output is especially sensitive to this parameter. This proves that the chosen method is able to disclose AMMANN ET AL.

10.1029/2020WR028311
18 of 27 systematic deficits of a model's internal process formulation that have been unaccounted for so far. It is therefore a promising tool to produce insights that stimulate experts' reasoning about the respective parts of the system and about potential model improvements. We showed that the SPM leads to a more reasonable distinction of process-related and residual stochasticity than the DPM. This allows for a better separation of the predictions of the true states of interest from their random observation error. On one hand, this leads to model output that better resembles the observations in terms of elevated autocorrelation. On the other hand, with the SPM, we obtain a more realistic uncertainty of internal model states and of the predicted true output, and a smaller residual (observation) error than with the DPM.
One major practical challenge when applying the tested approach turned out to be the control over the excessive degrees of freedom introduced with time-dependent parameters. This calls for further investigations into the possibilities of constraining the stochastic parameters, for example, through the prior distribution. Other challenges of more numerical nature were linked to the chosen sampler. These suggest that the sampler is only applicable if rather restrictive conditions are met and further research is needed to design more efficient samplers capable of inference with stochastic models. Future research should also focus on the application of SPMs to longer time series of observations that allow for proper calibration/validation splits and on better exploring the capability of such models to account for the large uncertainty in input that is often encountered in WQ settings, but which was of limited relevance in this study. Overall, there is a large potential benefit of using more SPMs to assess WQ problems at the catchment scale in the future, since we are often forced to use structurally deficient models to describe systems that obviously behave highly stochastic at the resolution we can observe them. Note. The distribution types are: LN, lognormal, U, uniform, Exp, exponential, "Fixed": parameter was fixed at the respective value and not inferred. A rather high R 2 (0.42, Figure 3b) was obtained when describing the temporal dynamics of  t,z2 S as a function of S u . The coinciding dynamics of the two quantities are shown in the original space in the last two rows of Figure B1. However, a dependency of the two quantities would be in contradiction of the physical understanding we have of sorption processes. Especially the peak of  t,z2 S close to the application day and the following sharp decrease ( Figure B1) is unrealistic and probably caused by compensating the models tendency to overestimate the concentration during a small rainfall event shortly after application. On the other hand, the subsequent slow increase in sorption equilibrium during the second half of the experiment ( Figure B1) is in agreement with a process called "aging" of soils. This process has been observed at many sites (Wauchope et al., 2002) and was also observed in the study catchment (Camenzuli, 2010), but it was not included in the model m to keep parsimony. The results of the two-dimensional local regression reveal a significant increase of R 2 to 0.63 when additionally including S g as an explanatory variable for  * t,z2 S ( Figure C2). Note that this should be interpreted with great caution, as the concentration measurements that support the inference of 