Pitfalls and improvements in the joint inference of heteroscedasticity and autocorrelation in hydrological model calibration



[1] Residual errors of hydrological models are usually both heteroscedastic and autocorrelated. However, only a few studies have attempted to explicitly include these two statistical properties into the residual error model and jointly infer them with the hydrological model parameters. This technical note shows that applying autoregressive error models to raw heteroscedastic residuals, as done in some recent studies, can lead to unstable error models with poor predictive performance. This instability can be avoided by applying the autoregressive process to standardized residuals. The theoretical analysis is supported by empirical findings in three hydrologically distinct catchments. The case studies also highlight strong interactions between the parameters of autoregressive residual error models and the water balance parameters of the hydrological model.

1. Introduction

[2] The residual errors of hydrological models, which represent the combined effects of data and model errors, are usually both heteroscedastic and autocorrelated [e.g., Sorooshian and Dracup, 1980; Kuczera, 1983; Bates and Campbell, 2001]. Heteroscedasticity is related to larger errors being generally associated with larger rainfalls and streamflows [e.g., Villarini and Krajewski, 2008; Thyer et al., 2009]. It can be represented by directly conditioning the variance of residual errors on explanatory variables such as runoff [e.g., Sorooshian and Dracup, 1980; Thyer et al., 2009; Pianosi and Raso, 2012], or by applying Box-Cox and other transformations [e.g., Kuczera, 1983; Bates and Campbell, 2001; Smith et al., 2010]. The direct conditioning approach is of particular interest because it allows exploiting additional information through explanatory variables. Autocorrelation is related to the “memory” of hydrological models, with storage errors propagating across multiple consecutive time steps [e.g., Kavetski et al., 2003]. It can be represented using autoregressive (AR) models [Kuczera, 1983], typically under lag-1 [AR(1)] assumptions [Schaefli et al., 2007; Schoups and Vrugt, 2010].

[3] This study pursues improved probabilistic descriptions of predictive and parametric uncertainties in hydrological modeling using residual error models [e.g., Kuczera, 1983; Bates and Campbell, 2001; Gallagher and Doherty, 2007; Willems, 2009; Smith et al., 2010, and many others]. It focuses on the joint inference of heteroscedasticity, autocorrelation and hydrological parameters. Recent work in this direction includes Schoups and Vrugt [2010], where a heteroscedastic skewed exponential distribution was combined with an AR(1) model, and all statistical and hydrological parameters estimated jointly. In this note, we show that a seemingly straightforward combination of heteroscedasticity and autocorrelation can result in error models with poor statistical and computational properties. We then present an alternative conceptualization of the heteroscedastic AR(1) model with a notably more robust performance.

[4] The presentation is structured as follows. Section 2 derives the statistical properties of two alternative heteroscedastic AR(1) error models. Section 3 empirically compares the predictive reliability, precision, and parameter inference (including parameter interactions) for the two error models on three hydrologically distinct catchments. The note concludes with a summary of key findings and practical recommendations in section 4.

2. Heteroscedastic Autocorrelated Residual Error Models

[5] Most statistical calibration schemes are based on residual errors inline image, defined as

display math(1)

where inline image and inline image are, respectively, the observed and simulated flows at time step t. As indicated, inline image is a function of the forcing data inline image (e.g., rainfall and evapotranspiration), the hydrological model parameters inline image and the initial conditions S0.

[6] The Bayesian posterior distribution corresponding to such residual error models is

display math(2)

where inline image denotes the parameters of the residual error model (such as standard deviation, etc.). The likelihood function in equation (2) is given by the joint pdf of the residuals

display math(3)

where inline image represents the vector of residual errors, computed over the calibration period inline image. A warm-up period inline image is used prior to the calibration period.

2.1. Approach 1: AR(1) Model Applied to Raw Residuals

2.1.1. Formulation and Basic Properties

[7] Let the raw residual errors inline image be described by

display math(4)

where the innovations inline image are independent zero-mean Gaussian deviates with a time-varying standard deviation inline image, and inline image is the lag-1 autoregressive parameter.

[8] To account for heteroscedasticity, we use a linear model for the standard deviation inline image, conditioned on the simulated streamflow Qt, with parameters inline image and inline image,

display math(5)

[9] This approach in equations (4) and (5) is analogous to the scheme used by Schoups and Vrugt [2010], except here, in order to focus more directly on autocorrelation and heteroscedasticity, we use Gaussian assumptions and do not introduce bias, skew, and kurtosis parameters.

[10] From equation (4), the residual errors inline image can be expressed in terms of the innovations inline image

display math(6)

where, as Nw → ∞, the term inline imagebecomes negligible provided inline image.

[11] Equation (6) shows that the marginal distribution of an AR(1) process with heteroscedastic Gaussian innovations is also Gaussian: ϕ is a constant and equation (6) is hence a sum of independent scaled Gaussian deviates.

[12] Equation (6) also shows that the process has zero mean ( inline image) because E[zt] = 0. Its variance can be obtained by considering that the innovations zt are defined as mutually independent,

display math(7)

[13] In general, inline image does not have a closed form expression. When inline image (i.e., constant variance in time), it simplifies to inline image, which corresponds to the marginal variance of a homoscedastic AR(1) process. Note that when the autoregressive coefficient ϕε approaches 1, the AR(1) process becomes nonstationary [Box and Jenkins, 1976]. In particular, when inline image, the term inline image in equation (6) does not decay as Nw → ∞, and hence inline image becomes infinite. As will be shown in section 3, this has major implications on residual error model behavior.

2.1.2. Likelihood Function

[14] Approach 1 introduces error model parameters inline image and results in the likelihood

display math(8)
display math(9)

where inline image denotes the pdf of a scalar Gaussian random deviate x with mean μ and standard deviation σ.

[15] These equations correspond exactly to the scheme of Schoups and Vrugt [2010] when the latter is applied with no bias, skew, or kurtosis. For the first time step, we use the marginal distribution p(ε1), which as shown above is Gaussian with zero mean and variance given by equation (7).

2.1.3. Lag-1 Autocorrelation of the Residuals

[16] The lag-1 autocorrelation of the raw residuals can be derived as

display math(10)

[17] Equation (10) shows that in Approach 1 the lag-1 autocorrelation of the residuals depends on both the autoregressive parameter inline image and on the form of heteroscedasticity.

2.2. Approach 2: AR(1) Model Applied to Standardized Residuals

[18] This section considers an alternative heteroscedastic AR(1) model, where the AR(1) assumptions are applied to standardized residuals, rather than to the raw residuals.

2.2.1. Formulation and Basic Properties

[19] Let us define a standardized residual error ηt as follows:

display math(11)

where inline image is the standard deviation of the raw residual at time t.

[20] Analogously to Approach 1, we can assume linear heteroscedasticity,

display math(12)

[21] We now apply the AR(1) process on the standardized residual errors, defining innovations yt as

display math(13)

[22] Equation (13) corresponds to a standard homoscedastic AR(1) process and, since inline image due to the standardization in equation (11), it follows that inline image.

2.2.2. Likelihood Function

[23] The error model parameters in Approach 2 are inline image. The likelihood function, given by the joint pdf of the residuals, p(ε), must account for the transformation in equation (11)

display math(14)

where inline image is the absolute value of the determinant of the Jacobian matrix of the transformation in equation (11) and can be derived to be inline image.

[24] The likelihood function is then

display math(15)

[25] Since we assumed Gaussian innovations yt in equation (13), we have

display math(16)

[26] In addition, from section 2.2.1, the marginal distribution at inline image is inline image.

2.2.3. Lag-1 Autocorrelation

[27] The lag-1 autocorrelation of residual errors in Approach 2 is

display math(17)

where we used the result

display math(18)

[28] Equation (17) shows that in Approach 2 the lag-1 autocorrelation inline image is constant in time and corresponds exactly to the autoregressive parameter inline image. This can be contrasted with equation (10) for Approach 1, where the lag-1 autocorrelation has a more complex structure.

2.3. Similarities and Differences Between Approaches 1 and 2

[29] Although Approaches 1 and 2 represent heteroscedasticity and autocorrelation using similar equations, the order in which these properties are treated is different (Table 1). In Approach 1, an AR(1) model is applied to the raw residuals, followed by the application of a heteroscedastic model to its innovations (by time-varying the conditional variance of the AR(1) process). In contrast, Approach 2 applies a heteroscedastic model to standardize the raw residuals, followed by the application of a homoscedastic AR(1) model to the standardized residuals (i.e., after the heteroscedasticity has been removed, or at least substantially reduced).

Table 1. Summary of Equations for Approaches 1 and 2
Approach 1Approach 2
inline image inline image
inline image inline image
  inline image
Table 2. Catchments Properties and Calibration and Validation Periods
Catchment Name and NumberArea (km2)Mean Annual Rainfall (mm)Mean Annual Runoff (mm)Runoff CoefficientPercentage of Days with Runoff <1 mmCalibration PeriodValidation Period
Lacmalac 41005767311823970.3365.028 Oct 1976–2 Oct 19783 Oct 1978–6 Sep 1983
Tinderry 4107344908081060.1395.322 Jun 1980–8 Sep 19889 Sep 1988–19 Apr 1995
French Broad 3451500244814138000.5716.38 Sep 1973–25 Nov 198126 Nov 1981–30 Apr 1998

[30] These structural differences lead to important differences in the mathematical behavior of the two approaches. Both approaches include autoregressive equations that accumulate the errors from previous innovations. However, in Approach 1 the innovations are heteroscedastic and can result in particularly large accumulated errors. For example, equation (7) shows that large innovations associated with peak streamflows propagate into the predictive uncertainty of subsequent recession time steps. This propagation is particularly strong when ϕε ≈ 1. Conversely, in Approach 2 the innovations of the AR(1) process are homoscedastic, and the heteroscedasticity in equation (12) is applied after (rather than before) the innovations are accumulated. The practical implications of these differences on the behavior of the error models will be investigated in an empirical case study (section 3).

[31] Approach 2 can be viewed as a particular case of the common “variance-stabilizing” strategy of transforming residual errors and then specifying a (possibly correlated) homoscedastic probability distribution of the transformed residuals. For example, see equations (A(4)) and (A(5)) in Bates and Campbell [2001], where a homoscedastic AR(1) process is applied to Box-Cox-transformed residuals. In Approach 2, the homoscedastic AR(1) process is applied to standardized residuals. Unlike the Box-Cox transformation, the explicit standardization in equations (11) and (12) directly exploits the conditioning on simulated streamflow as an explanatory variable.

[32] In contrast, Approach 1 corresponds to applying a joint “decorrelation” transformation zt = etϕ et−1 followed by specifying a heteroscedastic distribution of the transformed residuals z. It can hence be viewed as a special case of an “autocorrelation-reduction” strategy where the aim of the transformation is to reduce autocorrelation rather than heteroscedasticity.

3. Empirical Case Study

3.1. Hydrological Data, Models, and Methodology

[33] Approaches 1 and 2 are compared in three catchments: Lacmalac and Tinderry (South-East of Australia), and the French Broad River (Asheville, Texas). The climatology of these catchments and the daily data periods used in the analysis are listed in Table 2. These catchments provide a range of different climatologies and hydrological regimes: Tinderry is dry (ephemeral), Lacmalac is wet and French Broad River is particularly wet. Simulated streamflow is obtained using GR4J, a lumped rainfall-runoff model with 4 fitted parameters [Perrin et al., 2003]. Uniform priors are used on all inferred quantities, with parameter ranges specified in Table 3. The posterior distributions are optimized using a quasi-Newton method [Kavetski and Clark, 2010] and sampled using a multistage Metropolis algorithm [Thyer et al., 2009].

Table 3. Parameter Specifications
inline imageθ1 (mm)Maximum capacity of the production store10020,000
θ2 (mm)Groundwater exchange coefficient−500500
θ3 (mm)Maximum capacity of the routing store1500
θ4 (days)Time base of unit hydrograph0.510
inline imagea (mm)Heteroscedasticity intercept0.0001100
bHeteroscedasticity slope0.000110
ϕAutoregressive coefficient−0.9990.999

[34] Several performance metrics are used. Statistical reliability is evaluated using the predictive QQ plot [Thyer et al., 2009], precision is quantified using the average (in time) coefficient of variation of the predictive distribution. The adequacy of the AR(1) approximation is assessed using the autocorrelation function (ACF) of the standardized innovations, i.e., inline image for Approach 1 (see equation (4)) and inline image for Approach 2 (see equation (13)). The ACF should be as close as possible to 0. We appraise the extent to which the standardized innovations are homoscedastic by plotting them against the quantile of the simulated flow, and the extent to which they are Gaussian by comparing their empirical marginal density to a Gaussian distribution. Finally, we inspect the posterior parameter distributions and the interactions between hydrological and error model parameters.

3.2. Visual Assessment of Predictive Performance

[35] As shown in Figure 1, Approach 1 produces very vague predictions, with exceedingly wide and poorly behaved predictive bounds, especially just after the recessions. This behavior is particularly pronounced in the Lacmalac and French Broad River catchments, which are both humid. It appears related to the error accumulation behavior discussed in section 2.3. Another undesirable property of Approach 1 is its exceedingly high proportion of large negative predicted flows. Even if the predictive distribution was truncated at zero to avoid negative flows, the resulting distribution would remain poor. Conversely, Approach 2 does not suffer from these problems and has much better-behaved prediction limits with a few small negative flows that can be safely truncated.

Figure 1.

Predictive distributions of streamflow during representative portions of the validation period.

[36] For the Tinderry catchment, the predictive bounds obtained with Approaches 1 and 2 are very similar, and contain an exceedingly large fraction of negative flows. Given the ephemeral nature of this catchment, avoiding this degeneracy is likely to require a specialized treatment of low and zero flows, including the development of truncated likelihood functions [Smith et al., 2010].

3.3. Statistical Metrics of Predictive Performance

[37] Figure 2 compares Approaches 1 and 2 using a range of diagnostics. Consistently with the hydrographs in Figure 1, Approach 2 clearly outperforms Approach 1 in the Lacmalac and French Broad catchments, especially in terms of predictive precision (row 2 of Figure 2). Approach 2 also produces near-perfect reliability in the French Broad River, whereas Approach 1 systematically under-estimates streamflow (row 1 of Figure 2).

Figure 2.

Comparison of predictive performances of Approach 1 (white circles) and Approach 2 (black circles) over the validation period using a range of diagnostics. Statistical reliability (Row 1): predictive QQ-plots (for ease of visualization, the symbols are used to distinguish between the curves rather than to denote individual data points). Precision (Row 2): average (in time) coefficient of variation of the predictive distribution. Heteroscedasticity (Row 3): standardized innovations as a function of quantiles of simulated flow. Distributional check (Row 4): probability density plots of standardized innovations (the black line illustrates the error model assumptions––a Gaussian pdf with zero mean and unit standard deviation). Autocorrelation (Row 5): autocorrelation function (ACF) plots (with 95% confidence intervals indicated with dotted lines). The residual autocorrelations obtained when the AR(1) error model component is omitted (i.e., Approach 1 with inline image) are shown with a black line.

[38] The error heteroscedasticity appears adequately captured by equations (5) and (12) in the (wet) Lacmalac and French Broad catchments, with the standardized innovations being nearly homoscedastic as required (row 3 of Figure 2). This is not the case in the (ephemeral) Tinderry basin, where there is a clear remaining trend in the relationship between the standardized innovations and streamflow. The curvature in this trend suggests a nonlinear heteroscedastic model warrants investigation.

[39] Density plots of the standardized innovations (row 4 of Figure 2) reveal that they are generally symmetric but kurtotic, which is consistent with earlier studies [Schoups and Vrugt, 2010]. The kurtosis is particularly strong in the Tinderry catchment, where it is combined with a slight asymmetry. The Gaussian assumptions are hence questionable in this case. The standardized innovations obtained with Approach 1 exhibit a slight positive bias for the French Broad catchment, which is not the case when Approach 2 is used.

[40] The autocorrelation structure of the residuals appears well approximated by the AR(1) assumption, with the ACF of the standardized innovations being close to 0 for both Approaches 1 and 2, in all three catchments (row 5 of Figure 2). This clearly contrasts with the large autocorrelations when the AR(1) error model component is omitted, and highlights the need for autocorrelated error models.

[41] In general, Approaches 1 and 2 produce very similar, and generally poor, results in the Tinderry catchment (Approach 1 yields a small gain in reliability while Approach 2 produces slightly higher precision). This can be attributed to the ephemerality of the Tinderry catchment and reinforces the need to improve the hydrological model (e.g., to capture the wetting-up thresholds) and employ a specialized treatment of zero and near-zero flows in the likelihood function [Smith et al., 2010].

3.4. Parameter Inference

[42] Figure 3 shows that the posterior parameter distributions are generally well behaved with both Approaches 1 and 2; most parameters appear well identified. However, the inferences are different and some problematic features are evident. In Approach 1, the GR4J water balance parameter inline image is highly negative (indicating export of groundwater from the catchment) and is negatively correlated with the heteroscedasticity slope inline image. In Approach 2, inline image is close to 0, though it remains correlated with inline image. Approach 2 also has a strong positive correlation between inline image and inline image, which appears absent in Approach 1. Similar results (not shown here) were found in the other two catchments.

Figure 3.

Posterior parameter distributions for the Lacmalac catchment.

4. Conclusions and Recommendations

[43] This technical note illustrates the challenges of fitting hydrological model parameters jointly with the autocorrelation and heteroscedasticity parameters of residual error models. An empirical case study was undertaken, based on three catchments with diverse hydrological dynamics and the widely used GR4J hydrological model. Two distinct residual error models based on Gaussian AR(1) processes were compared using a range of diagnostics. The main conclusions are as follows:

[44] (1) When jointly inferring heteroscedasticity and autocorrelation parameters, applying Gaussian AR(1) models directly to the raw residual errors can produce a poorly behaved error model with grossly exaggerated predictive uncertainty. This instability can be avoided by applying the Gaussian AR(1) process to standardized residuals. This empirical finding appears consistent with analytical insights into the error accumulation properties of the two statistical error models.

[45] (2) Hydrological parameters directly controlling the water balance can interact strongly when fitted jointly with heteroscedasticity and autocorrelation parameters. Since the current study employed a single hydrological model, further testing is needed to ascertain its relation to conclusion 1 above.

[46] (3) Applying the Gaussian AR(1) process to standardized residuals results in strong interactions between autoregressive and heteroscedastic parameters. Further research is required to understand the origin of these interactions and whether they can be eliminated or at least reduced (e.g., using alternative autocorrelation structures, parameter transformations or reparameterizations).

[47] (4) Ephemeral catchments remain particularly hard to model, and in addition to a robust representation of heteroscedasticity and autocorrelation, require a more robust handling of skew, near-zero flows, seasonality and other aspects (confirming previous studies on this topic). Moreover, in all three basins, there is evidence of excess kurtosis in the standardized innovations.

[48] Finding 1 can help hydrologists improve the calibration of hydrological models by avoiding fundamental statistical problems, while Findings 2–4 highlight areas for further research, in particular including the analysis of these issues within more complex error models.