Comment on “The Treatment of Uncertainty in Hydrometric Observations: A Probabilistic Description of Streamflow Records” by de Oliveira and Vrugt

Quantifying uncertainties and errors in hydrometric observations is critical to improve our ability to predict streamflow. de Oliveira and Vrugt (2022, https://doi.org/10.1029/2022wr032263) expand on an earlier publication (Vrugt et al., 2005, https://doi.org/10.1029/2004wr003059) to describe a difference‐based variance estimation method that is, used to estimate the aleatory observation uncertainty in streamflow records directly from a time series of such data. The method is then applied to hourly data from 500+ catchments across the contiguous United States. This comment outlines our concerns that the assumptions needed to effectively use difference‐based variance estimation methods are not always met by hourly streamflow records. We illustrate our concerns through the use of a specific test case. We argue that more work is needed to quantify the individual sources of errors and uncertainties in hydrometric observations.

highlight what we see as the main drawbacks of this method.We deviate from dOV22 in how results are presented (i.e., plotted on both linear and logarithmic axes, instead of only on logarithmic axes), but not in how the results are calculated.The data used in this manuscript are streamflow records from the Bow River at Banff (Alberta, Canada), but our findings apply also to the main example used in dOV22, the Leaf River near Collins (Mississippi, United States).The "Data Availability Statement" section contains a reference to the data and code needed to reproduce both cases.

The dOV22 Difference-Based Variance Estimation Method
In the following sections, we use the same notation as used in dOV22 for clarity.V05 and dOV22 describe streamflow records  ỹ as a time series of errors ϵ superimposed upon the true flow value  () (Equations 1 and 2 in dOV22): where  (, ) denotes that the errors are normally distributed with a mean of 0 and a covariance of Σ ϵ .The covariance is then defined as: where n is the length of the streamflow record  ỹ , and   2   is the variance of the distribution from which individual error values ϵ 1 , ϵ 2 , ϵ n are drawn.The time series of errors ϵ is assumed to consist of independent (i.e., uncorrelated) random variables with a mean of zero and non-constant variance.
Due to the non-linearities typically found in flow processes and rating curves, the errors associated with streamflow records derived from hydrometric measurements tend to increase as the flow magnitude increases (i.e., errors are heteroscedastic; Sorooshian & Dracup, 1980).A simple but common assumption to describe such heteroscedastic errors is a multiplicative linear model.In the case of hourly time steps (indicated with subscript h), dOV22 approximate the variance of the error   2 h on each time step as a fraction α of the flow yth on each time step, including some offset defined as a fraction β of the mean hourly flow m h (Equation 4 in dOV22; see also Equation 6 in dOV22): In order to estimate α and β, V05 and dOV22 attempt to quantify the heteroscedastic variance where is the binomial coefficient, and  Δ  ( ỹh) is a differencing operator of order k, applied to k + 1 consecutive time steps of    .For the specific case k = 3 used in dOV22 these equations become: ) −1 = 0.05 (5) The local    2 h estimates resulting from this method can be highly variable.To address this, dOV22 introduces a smoothing approach that (a) sorts the streamflow record h estimates as "step 1," and the smoothing of these estimates as "step 2." dOV22 define the requirements for using this difference-based variance estimation method as "(i) the data-generating process  () is sufficiently smooth, and (ii) the measurement frequency is high compared to the typical timescale of  () ."An interpretation of these assumptions is that, if the measurement frequency is so high that the true values  () (Equation 1) are close to constant between consecutive measurement times, any deviations in  ỹ between these time steps originate mostly in ϵ.The rationale of dOV22 is that the difference of   h between these consecutive time steps gives an estimate of the magnitude of local error variance    2 h and that this can be used to approximate ϵ.

Analysis of the Variance-Based Difference Estimation Method
For consistency with the notation used in the figures in dOV22, the following sections discuss both the local variance estimates    2 h and the corresponding standard deviation estimates    .Note that while subscript h is replaced with subscript t , the analysis is still performed on hourly data.

𝑨𝑨 σ𝝈𝒕𝒕 Estimates
Difference-based variance estimation approaches have merit when the measurement errors are uncorrelated in time and when the measurement error ϵ is large compared to changes in the signal  () between consecutive time steps.These assumptions do not always hold.For example, in Figure 1a values in the streamflow record increase from approximately 96 m 3 s −1 to approximately 179 m 3 s −1 over the period of 24 hr, as measured between 14-Jun 12:00 and 15-Jun 12:00 (indicated with arrows on the x-axis).In this case, changes in the streamflow record  ỹ (Equation 1) are dominated by changes in the true signal  () and not by the error ϵ.The difference-based approach in Equation 4 therefore gives an inappropriate local    2 h value because assumption ii ("the measurement frequency is high compared to the typical timescale of  () ") no longer holds.It is clear in Figure 1a (orange lines) that the local    values tend to be particularly high when the hydrograph changes rapidly, and not, as the error model used by V05 and dOV22 suggests it should be (see Equation 3), when the largest flow magnitudes occur.Due to the nature of hydrologic processes, hydrographs tend to change rapidly during the rising limbs of flood peaks and in such cases Equation 4 thus estimates the largest    values during the periods with the most rapid rate-of-change, which tend to coincide with medium (and not the highest) flows.For any regime with recurring sharply rising hydrographs, such as the snowmelt-driven regime shown in Figure 1a, this behavior may be expected to repeat yearly.   values associated with the sorted streamflow record using a moving window of length m = 100 on either side of the data point.However, Figure 1b (blue squares) shows that even with such a moving window the average error estimates are highest during medium flows and decrease in magnitude for the highest flows.It may be argued (as one reviewer of this comment suggested) that the highest    estimates should be considered outliers and excluded from the moving window average (e.g., through the use of a Theil-Sen slope estimator described by Helsel et al. (2020, ch. 10), though such an approach would not necessarily lead to error estimates that conform to a linear multiplicative error model either).However, our main concern is not how these outliers are treated, but that the difference-based variance estimation method generates local    estimates that need to be treated as outliers at all.In general, defining outliers should be based on some physical understanding of their occurrence that also explains why these values need to be discarded.Specifically to this case, these outliers cluster around medium flow magnitudes solely as an artifact of the method used to generate the error estimates.This implies that there are limits to the applicability of difference-based variance estimation methods for use with hydrologic time series.

Previous Assessments of Difference-Based Variance Estimation With Streamflow Records
In summary, though dOV22 assumes a linear heteroscedastic error model (Equation 3), the difference-based variance estimation method returns local error estimates that behave quite differently, and smoothing these local estimates with a moving window average does not change this.We speculate that this behavior was not obvious in V05 and dOV22 due to two separate reasons.
First, both V05 and dOV22 rely at least in part on a visual interpretation of the generated    estimates to judge if these estimates follow their assumed hetereoscedastic error model.V05 and dOV22 plot streamflow records    against    estimates on a double logarithmic scale.To illustrate this point, Figure 1c shows the same data as Figure 1b but plotted on such a double logarithmic scale.As Figure 1c shows, the logarithmic scale plots the medium flows (with the highest error estimates) at the right of the plotting area and reduces the vertical space used by this part of the data range.Because the highest half of the range of the streamflow record now takes up less than a quarter of the plotting area, it is more difficult to visually distinguish the distribution of    estimates during the medium and high flows.
Second, V05 and dOV22 test the difference-based variance estimation method with synthetic test cases, generated by corrupting model simulations of river flow with uncorrelated error values drawn from a normal distribution with zero mean and heteroscedastic variance.For larger values of α and β, the uncorrelated random nature of these synthetic errors creates time series that are not as smooth as real streamflow records (compare the insets in Figures 1a and 2a, showing an observed flow peak and the same peak corrupted with such synthetic error values, respectively).The heteroscedastic nature of these errors ensures that these perturbations are higher around higher flows than they are around lower flows.This approach thus generates artificially large rate-of-change values 10.1029/2022WR034294 5 of 7 in the hydrographs used to test the variance-based error estimator, with a tendency to produce larger artificial rate-of-change values in regions with higher flow.These synthetic test cases thus tend to show higher    values for higher flow magnitudes (Figures 2a-2c), but it is debatable whether the jagged nature of these synthetic test cases makes them appropriate test cases for using difference-based variance estimation methods with real, much smoother, hydrographs.

Discussion
The difference-based variance estimation method relies on the combined assumptions that the time between consecutive measurements is small enough, and that the measurements are accurate enough, that any differences between two consecutive values in the streamflow record  ỹ originate mostly in the error ϵ (Equation 1).The local    estimates we obtained with this method for a real hydrograph (Figures 1a-1c) are highest for medium flow magnitudes.This same pattern can be seen to some extent in Figure 6 in V05 and in Figure 2 in dOV22, though this is harder to see on the double logarithmic scales used in those figures.These highest    estimates for medium flows are inconsistent with the heteroscedastic error model assumed by V05 and dOV22 for real hydrographs (Equation 3), and seem to appear in cases where the difference between flow values at consecutive time steps originate mostly in the true signal  () rather than in error ϵ (e.g., during rising and falling limbs when the rate of change of flow is high).This suggests that, if we assume that Equation 3 is an appropriate description of the relationship between flow and error magnitudes, at least in some cases hourly streamflow records do not have the temporal density necessary to effectively use difference-based variance estimation methods.
A further complication is that the difference-based variance estimation method is divorced from current understanding of the sources of errors in streamflow records.Streamflow records are affected by many different errors sources (e.g., limited accuracy of water level sensors, interpolation and extrapolation of rating curves), and many of these error sources are not constant in time (e.g., rating curve shifts, river ice, high flows leaving river embankments, unsteady flow conditions, wind-driven surface waves).It is possible that a multiplicative linear error model based on magnitude (Equation 3) does not describe each individual error source equally well, and it is debatable whether an individual error source, in this case aleatory (i.e., random) errors, can effectively be extracted from time series that include multiple different error sources without accounting for other errors.
Depending on the (unknown) characteristics of the true discharge  () and the error ϵ, it is possible that difference-based variance estimators can be used for gauges with higher sampling frequencies than hourly, or that hourly sampling is in fact an appropriate sampling frequency for gauges with gradually changing hydrographs.However, it is our opinion that neither V05 nor dOV22 provide compelling evidence that these methods can be applied to any arbitrary collection of hourly streamflow records, and this is of particular importance in the case of dOV22 where these methods are applied to 500+ watersheds across the contiguous United States.

Summary
Obtaining correct estimates of hydrometric observation uncertainties is critical to advance hydrologic simulation and prediction.Future research directions likely involve investigating and characterizing the individual sources of error that make up the total error ϵ (e.g., Coxon et al., 2015;Di Baldassarre & Montanari, 2009;Gharari et al., 2023;Kiang et al., 2018).Meanwhile, approaches such as the simple error model assumed in Equation 3 and the difference-based variance estimation method described in Vrugt et al. (2005) and de Oliveira and Vrugt (2022) have their place, provided that their limitations are understood.We argue that one of the critical assumptions of the difference-based variance estimation method, namely that the measurement frequency of the true signal  () is high compared to typical timescale of changes in  () , may not generally apply to hourly streamflow records, and that this leads to error estimates that are inconsistent with the linear heteroscedastic error model assumed in Equation 3. Specifically, the difference-based variance estimator of dOV22 seems to return large local estimates   during periods of high rate of change in the flow regime.We believe that treating such values as outliers that can be ignored or smoothed away is inappropriate, and instead believe that these values indicate that there are limits to how and where difference-based variance estimators can safely be applied in discharge records.We urge caution about the use of these difference-based variance estimation methods until such time that the hydrologic interpretation of their requirements is better understood.

Data Availability Statement
Streamflow records used in the preparation of Figures 1 and 2 was extracted from the Environment and Climate Change Canada Real-time Hydrometric Data web site (https://wateroffice.ec.gc.ca/mainmenu/real_time_data_ index_e.html) on 22 December 2021.Real-time data for the Bow River at Banff can be accessed through clicking Station Search and entering Station Number 05BB001, for the time period between current, and current minus 18 months.A copy of the exact data and code to reproduce Figures 1 and 2 are available through Zenodo (Knoben, 2023).the editors of Water Resources Research for providing a space for this comment.Wouter Knoben and Martyn Clark were supported by the Global Water Futures program, University of Saskatchewan.We thank one anonymous reviewer and Rory Nathan for their constructive comments on early versions of this paper.We appreciate the reply to our comment and thank Drs. de Oliveira and Vrugt for taking the time to address and clarify some of our concerns.We have made minor changes to our comment to improve clarity after seeing the reply from de Oliveira and Vrugt.

Figure 1 .
Figure 1.Difference-based variance estimates of streamflow record error using 9 months of streamflow values at the Bow at Banff gauging station in Alberta, Canada.Data were obtained at 5-min temporal resolution and resampled into hourly observations.Error estimates    were obtained using Algorithm S1 in dOV22.(a) Streamflow record and estimated    values, showing that the highest errors values are estimated during periods of the most rapid flow changes.The inset shows flow during the highest flow peak.(b) Streamflow record and estimated errors plotted on a linear scale, showing a tendency to estimate the highest errors during medium flows, both without (gray dots) and with smoothing (blue squares).The Y-axis is capped at 0.5 to better show the estimated error pattern; some    values in the medium flow range fall outside the plotting area.(c) Streamflow record and estimated errors plotted on a logarithmic scale, making it more difficult to distinguish the error distribution in the medium and high flow ranges.

Figure 2 .
Figure 2. Difference-based variance estimates of streamflow record error using 9 months of streamflow values at the Bow at Banff gauging station in Alberta, Canada.Data were corrupted with synthetic errors drawn from an uncorrelated random normal distribution with variance  ( + ) 2 (Equations 3 and 4 in dOV22).(a) Synthetic streamflow record and estimated    values, showing a very different distribution of error estimates compared to those obtained with uncorrupted data (Figure 1).The inset shows synthetic flow during the highest flow peak.(b) Synthetic streamflow record and error estimates plotted on a linear scale.(c) Synthetic streamflow record and error estimates plotted on a logarithmic scale.

2: Effectiveness of 𝑨𝑨 σ𝝈𝒕𝒕 Smoothing dOV22
argues that the estimated error values    for individual data points are not particularly robust and that some form of smoothing of the    values is needed to calculate average error estimates with greater temporal support.dOV22 implements this by first sorting the values in the streamflow record    by magnitude, and then averaging the