Residual errors of hydrological models are usually both heteroscedastic and autocorrelated. However, only a few studies have attempted to explicitly include these two statistical properties into the residual error model and jointly infer them with the hydrological model parameters. This technical note shows that applying autoregressive error models to raw heteroscedastic residuals, as done in some recent studies, can lead to unstable error models with poor predictive performance. This instability can be avoided by applying the autoregressive process to standardized residuals. The theoretical analysis is supported by empirical findings in three hydrologically distinct catchments. The case studies also highlight strong interactions between the parameters of autoregressive residual error models and the water balance parameters of the hydrological model.
 The residual errors of hydrological models, which represent the combined effects of data and model errors, are usually both heteroscedastic and autocorrelated [e.g., Sorooshian and Dracup, 1980; Kuczera, 1983; Bates and Campbell, 2001]. Heteroscedasticity is related to larger errors being generally associated with larger rainfalls and streamflows [e.g., Villarini and Krajewski, 2008; Thyer et al., 2009]. It can be represented by directly conditioning the variance of residual errors on explanatory variables such as runoff [e.g., Sorooshian and Dracup, 1980; Thyer et al., 2009; Pianosi and Raso, 2012], or by applying Box-Cox and other transformations [e.g., Kuczera, 1983; Bates and Campbell, 2001; Smith et al., 2010]. The direct conditioning approach is of particular interest because it allows exploiting additional information through explanatory variables. Autocorrelation is related to the “memory” of hydrological models, with storage errors propagating across multiple consecutive time steps [e.g., Kavetski et al., 2003]. It can be represented using autoregressive (AR) models [Kuczera, 1983], typically under lag-1 [AR(1)] assumptions [Schaefli et al., 2007; Schoups and Vrugt, 2010].
 This study pursues improved probabilistic descriptions of predictive and parametric uncertainties in hydrological modeling using residual error models [e.g., Kuczera, 1983; Bates and Campbell, 2001; Gallagher and Doherty, 2007; Willems, 2009; Smith et al., 2010, and many others]. It focuses on the joint inference of heteroscedasticity, autocorrelation and hydrological parameters. Recent work in this direction includes Schoups and Vrugt , where a heteroscedastic skewed exponential distribution was combined with an AR(1) model, and all statistical and hydrological parameters estimated jointly. In this note, we show that a seemingly straightforward combination of heteroscedasticity and autocorrelation can result in error models with poor statistical and computational properties. We then present an alternative conceptualization of the heteroscedastic AR(1) model with a notably more robust performance.
 The presentation is structured as follows. Section 2 derives the statistical properties of two alternative heteroscedastic AR(1) error models. Section 3 empirically compares the predictive reliability, precision, and parameter inference (including parameter interactions) for the two error models on three hydrologically distinct catchments. The note concludes with a summary of key findings and practical recommendations in section 4.
 Most statistical calibration schemes are based on residual errors , defined as
where and are, respectively, the observed and simulated flows at time step t. As indicated, is a function of the forcing data (e.g., rainfall and evapotranspiration), the hydrological model parameters and the initial conditions S0.
 The Bayesian posterior distribution corresponding to such residual error models is
where denotes the parameters of the residual error model (such as standard deviation, etc.). The likelihood function in equation (2) is given by the joint pdf of the residuals
where represents the vector of residual errors, computed over the calibration period . A warm-up period is used prior to the calibration period.
2.1. Approach 1: AR(1) Model Applied to Raw Residuals
2.1.1. Formulation and Basic Properties
 Let the raw residual errors be described by
where the innovations are independent zero-mean Gaussian deviates with a time-varying standard deviation , and is the lag-1 autoregressive parameter.
 To account for heteroscedasticity, we use a linear model for the standard deviation , conditioned on the simulated streamflow Qt, with parameters and ,
 This approach in equations (4) and (5) is analogous to the scheme used by Schoups and Vrugt , except here, in order to focus more directly on autocorrelation and heteroscedasticity, we use Gaussian assumptions and do not introduce bias, skew, and kurtosis parameters.
 From equation (4), the residual errors can be expressed in terms of the innovations
where, as Nw → ∞, the term becomes negligible provided .
 Equation (6) shows that the marginal distribution of an AR(1) process with heteroscedastic Gaussian innovations is also Gaussian: ϕ is a constant and equation (6) is hence a sum of independent scaled Gaussian deviates.
 Equation (6) also shows that the process has zero mean ( ) because E[zt] = 0. Its variance can be obtained by considering that the innovations zt are defined as mutually independent,
 In general, does not have a closed form expression. When (i.e., constant variance in time), it simplifies to , which corresponds to the marginal variance of a homoscedastic AR(1) process. Note that when the autoregressive coefficient ϕε approaches 1, the AR(1) process becomes nonstationary [Box and Jenkins, 1976]. In particular, when , the term in equation (6) does not decay as Nw → ∞, and hence becomes infinite. As will be shown in section 3, this has major implications on residual error model behavior.
2.1.2. Likelihood Function
 Approach 1 introduces error model parameters and results in the likelihood
where denotes the pdf of a scalar Gaussian random deviate x with mean μ and standard deviation σ.
 These equations correspond exactly to the scheme of Schoups and Vrugt  when the latter is applied with no bias, skew, or kurtosis. For the first time step, we use the marginal distribution p(ε1), which as shown above is Gaussian with zero mean and variance given by equation (7).
2.1.3. Lag-1 Autocorrelation of the Residuals
 The lag-1 autocorrelation of the raw residuals can be derived as
 Equation (10) shows that in Approach 1 the lag-1 autocorrelation of the residuals depends on both the autoregressive parameter and on the form of heteroscedasticity.
2.2. Approach 2: AR(1) Model Applied to Standardized Residuals
 This section considers an alternative heteroscedastic AR(1) model, where the AR(1) assumptions are applied to standardized residuals, rather than to the raw residuals.
2.2.1. Formulation and Basic Properties
 Let us define a standardized residual error ηt as follows:
where is the standard deviation of the raw residual at time t.
 Analogously to Approach 1, we can assume linear heteroscedasticity,
 We now apply the AR(1) process on the standardized residual errors, defining innovations yt as
 Equation (13) corresponds to a standard homoscedastic AR(1) process and, since due to the standardization in equation (11), it follows that .
2.2.2. Likelihood Function
 The error model parameters in Approach 2 are . The likelihood function, given by the joint pdf of the residuals, p(ε), must account for the transformation in equation (11)
where is the absolute value of the determinant of the Jacobian matrix of the transformation in equation (11) and can be derived to be .
 The likelihood function is then
 Since we assumed Gaussian innovations yt in equation (13), we have
 In addition, from section 2.2.1, the marginal distribution at is .
2.2.3. Lag-1 Autocorrelation
 The lag-1 autocorrelation of residual errors in Approach 2 is
where we used the result
 Equation (17) shows that in Approach 2 the lag-1 autocorrelation is constant in time and corresponds exactly to the autoregressive parameter . This can be contrasted with equation (10) for Approach 1, where the lag-1 autocorrelation has a more complex structure.
2.3. Similarities and Differences Between Approaches 1 and 2
 Although Approaches 1 and 2 represent heteroscedasticity and autocorrelation using similar equations, the order in which these properties are treated is different (Table 1). In Approach 1, an AR(1) model is applied to the raw residuals, followed by the application of a heteroscedastic model to its innovations (by time-varying the conditional variance of the AR(1) process). In contrast, Approach 2 applies a heteroscedastic model to standardize the raw residuals, followed by the application of a homoscedastic AR(1) model to the standardized residuals (i.e., after the heteroscedasticity has been removed, or at least substantially reduced).
Table 1. Summary of Equations for Approaches 1 and 2
Table 2. Catchments Properties and Calibration and Validation Periods
Catchment Name and Number
Mean Annual Rainfall (mm)
Mean Annual Runoff (mm)
Percentage of Days with Runoff <1 mm
28 Oct 1976–2 Oct 1978
3 Oct 1978–6 Sep 1983
22 Jun 1980–8 Sep 1988
9 Sep 1988–19 Apr 1995
French Broad 3451500
8 Sep 1973–25 Nov 1981
26 Nov 1981–30 Apr 1998
 These structural differences lead to important differences in the mathematical behavior of the two approaches. Both approaches include autoregressive equations that accumulate the errors from previous innovations. However, in Approach 1 the innovations are heteroscedastic and can result in particularly large accumulated errors. For example, equation (7) shows that large innovations associated with peak streamflows propagate into the predictive uncertainty of subsequent recession time steps. This propagation is particularly strong when ϕε ≈ 1. Conversely, in Approach 2 the innovations of the AR(1) process are homoscedastic, and the heteroscedasticity in equation (12) is applied after (rather than before) the innovations are accumulated. The practical implications of these differences on the behavior of the error models will be investigated in an empirical case study (section 3).
 Approach 2 can be viewed as a particular case of the common “variance-stabilizing” strategy of transforming residual errors and then specifying a (possibly correlated) homoscedastic probability distribution of the transformed residuals. For example, see equations (A(4)) and (A(5)) in Bates and Campbell , where a homoscedastic AR(1) process is applied to Box-Cox-transformed residuals. In Approach 2, the homoscedastic AR(1) process is applied to standardized residuals. Unlike the Box-Cox transformation, the explicit standardization in equations (11) and (12) directly exploits the conditioning on simulated streamflow as an explanatory variable.
 In contrast, Approach 1 corresponds to applying a joint “decorrelation” transformation zt = et −ϕ et−1 followed by specifying a heteroscedastic distribution of the transformed residuals z. It can hence be viewed as a special case of an “autocorrelation-reduction” strategy where the aim of the transformation is to reduce autocorrelation rather than heteroscedasticity.
3. Empirical Case Study
3.1. Hydrological Data, Models, and Methodology
 Approaches 1 and 2 are compared in three catchments: Lacmalac and Tinderry (South-East of Australia), and the French Broad River (Asheville, Texas). The climatology of these catchments and the daily data periods used in the analysis are listed in Table 2. These catchments provide a range of different climatologies and hydrological regimes: Tinderry is dry (ephemeral), Lacmalac is wet and French Broad River is particularly wet. Simulated streamflow is obtained using GR4J, a lumped rainfall-runoff model with 4 fitted parameters [Perrin et al., 2003]. Uniform priors are used on all inferred quantities, with parameter ranges specified in Table 3. The posterior distributions are optimized using a quasi-Newton method [Kavetski and Clark, 2010] and sampled using a multistage Metropolis algorithm [Thyer et al., 2009].
Table 3. Parameter Specifications
Maximum capacity of the production store
Groundwater exchange coefficient
Maximum capacity of the routing store
Time base of unit hydrograph
 Several performance metrics are used. Statistical reliability is evaluated using the predictive QQ plot [Thyer et al., 2009], precision is quantified using the average (in time) coefficient of variation of the predictive distribution. The adequacy of the AR(1) approximation is assessed using the autocorrelation function (ACF) of the standardized innovations, i.e., for Approach 1 (see equation (4)) and for Approach 2 (see equation (13)). The ACF should be as close as possible to 0. We appraise the extent to which the standardized innovations are homoscedastic by plotting them against the quantile of the simulated flow, and the extent to which they are Gaussian by comparing their empirical marginal density to a Gaussian distribution. Finally, we inspect the posterior parameter distributions and the interactions between hydrological and error model parameters.
3.2. Visual Assessment of Predictive Performance
 As shown in Figure 1, Approach 1 produces very vague predictions, with exceedingly wide and poorly behaved predictive bounds, especially just after the recessions. This behavior is particularly pronounced in the Lacmalac and French Broad River catchments, which are both humid. It appears related to the error accumulation behavior discussed in section 2.3. Another undesirable property of Approach 1 is its exceedingly high proportion of large negative predicted flows. Even if the predictive distribution was truncated at zero to avoid negative flows, the resulting distribution would remain poor. Conversely, Approach 2 does not suffer from these problems and has much better-behaved prediction limits with a few small negative flows that can be safely truncated.
 For the Tinderry catchment, the predictive bounds obtained with Approaches 1 and 2 are very similar, and contain an exceedingly large fraction of negative flows. Given the ephemeral nature of this catchment, avoiding this degeneracy is likely to require a specialized treatment of low and zero flows, including the development of truncated likelihood functions [Smith et al., 2010].
3.3. Statistical Metrics of Predictive Performance
 Figure 2 compares Approaches 1 and 2 using a range of diagnostics. Consistently with the hydrographs in Figure 1, Approach 2 clearly outperforms Approach 1 in the Lacmalac and French Broad catchments, especially in terms of predictive precision (row 2 of Figure 2). Approach 2 also produces near-perfect reliability in the French Broad River, whereas Approach 1 systematically under-estimates streamflow (row 1 of Figure 2).
 The error heteroscedasticity appears adequately captured by equations (5) and (12) in the (wet) Lacmalac and French Broad catchments, with the standardized innovations being nearly homoscedastic as required (row 3 of Figure 2). This is not the case in the (ephemeral) Tinderry basin, where there is a clear remaining trend in the relationship between the standardized innovations and streamflow. The curvature in this trend suggests a nonlinear heteroscedastic model warrants investigation.
 Density plots of the standardized innovations (row 4 of Figure 2) reveal that they are generally symmetric but kurtotic, which is consistent with earlier studies [Schoups and Vrugt, 2010]. The kurtosis is particularly strong in the Tinderry catchment, where it is combined with a slight asymmetry. The Gaussian assumptions are hence questionable in this case. The standardized innovations obtained with Approach 1 exhibit a slight positive bias for the French Broad catchment, which is not the case when Approach 2 is used.
 The autocorrelation structure of the residuals appears well approximated by the AR(1) assumption, with the ACF of the standardized innovations being close to 0 for both Approaches 1 and 2, in all three catchments (row 5 of Figure 2). This clearly contrasts with the large autocorrelations when the AR(1) error model component is omitted, and highlights the need for autocorrelated error models.
 In general, Approaches 1 and 2 produce very similar, and generally poor, results in the Tinderry catchment (Approach 1 yields a small gain in reliability while Approach 2 produces slightly higher precision). This can be attributed to the ephemerality of the Tinderry catchment and reinforces the need to improve the hydrological model (e.g., to capture the wetting-up thresholds) and employ a specialized treatment of zero and near-zero flows in the likelihood function [Smith et al., 2010].
3.4. Parameter Inference
 Figure 3 shows that the posterior parameter distributions are generally well behaved with both Approaches 1 and 2; most parameters appear well identified. However, the inferences are different and some problematic features are evident. In Approach 1, the GR4J water balance parameter is highly negative (indicating export of groundwater from the catchment) and is negatively correlated with the heteroscedasticity slope . In Approach 2, is close to 0, though it remains correlated with . Approach 2 also has a strong positive correlation between and , which appears absent in Approach 1. Similar results (not shown here) were found in the other two catchments.
4. Conclusions and Recommendations
 This technical note illustrates the challenges of fitting hydrological model parameters jointly with the autocorrelation and heteroscedasticity parameters of residual error models. An empirical case study was undertaken, based on three catchments with diverse hydrological dynamics and the widely used GR4J hydrological model. Two distinct residual error models based on Gaussian AR(1) processes were compared using a range of diagnostics. The main conclusions are as follows:
 (1) When jointly inferring heteroscedasticity and autocorrelation parameters, applying Gaussian AR(1) models directly to the raw residual errors can produce a poorly behaved error model with grossly exaggerated predictive uncertainty. This instability can be avoided by applying the Gaussian AR(1) process to standardized residuals. This empirical finding appears consistent with analytical insights into the error accumulation properties of the two statistical error models.
 (2) Hydrological parameters directly controlling the water balance can interact strongly when fitted jointly with heteroscedasticity and autocorrelation parameters. Since the current study employed a single hydrological model, further testing is needed to ascertain its relation to conclusion 1 above.
 (3) Applying the Gaussian AR(1) process to standardized residuals results in strong interactions between autoregressive and heteroscedastic parameters. Further research is required to understand the origin of these interactions and whether they can be eliminated or at least reduced (e.g., using alternative autocorrelation structures, parameter transformations or reparameterizations).
 (4) Ephemeral catchments remain particularly hard to model, and in addition to a robust representation of heteroscedasticity and autocorrelation, require a more robust handling of skew, near-zero flows, seasonality and other aspects (confirming previous studies on this topic). Moreover, in all three basins, there is evidence of excess kurtosis in the standardized innovations.
 Finding 1 can help hydrologists improve the calibration of hydrological models by avoiding fundamental statistical problems, while Findings 2–4 highlight areas for further research, in particular including the analysis of these issues within more complex error models.