Uncertainties of Annual Suspended Sediment Transport Estimates Driven by Temporal Variability

The majority of sediment transported from rivers to the global oceans is moved in suspension as fine particles. Thereby, the transported sediment shapes the physical environment regarding erosive and accumulative processes. Temporal variations in sediment supply and transport lead to unquantified uncertainties in annual load estimates, requiring high‐resolution data sets and a sound understanding of site‐specific catchment characteristics. We investigate the temporal variability of suspended sediment transport in four catchments in Germany with highly different discharge regimes and catchment sizes (<1,000 km2 to >100,000 km2). The data set consists of high‐resolution 15‐min turbidity measurements with daily discharge and frequent manual sampling. Utilizing a bootstrap approach based on the 15‐min time series, we assessed the impact of the sampling interval on annual load estimates for less frequent data sets. We use the sediment load exceedance time (Ts80%) as a measure of variability and relate it to uncertainties in annual load estimates. Since low‐frequency data sets rely on sediment rating curves, we performed a sensitivity analysis of the rating parameters a, b, and ε. Our results indicate a negative exponential relationship between Ts80% and uncertainties in annual load estimates. Based on the Ts80%, we can derive the shortest sampling frequency necessary to obtain annual load estimates with an error of <20% over varying discharge regimes. Additionally, Ts80% is linked to rating exponent b, with low b‐values indicating high Ts80%‐values and lower variability, and high b‐values indicating higher variability.

• Negative exponential relationship between sediment load exceedance time and uncertainties in annual load estimates • Optimal sampling frequency can be derived from uncertainty-sediment load exceedance time function • Sediment rating exponent b is directly linked to sediment load exceedance time

Supporting Information:
Supporting Information may be found in the online version of this article.
and evaluating the effect of sediment supply and siltation with respect to the life time of reservoirs (Pratson et al., 2008).Currently, almost half of the world's rivers are affected by dams and reservoirs, with large reservoirs trapping up to 85% of the incoming total sediment load and in many cases 100% of coarse material being transported as bedload (Frings et al., 2019;Vörösmarty et al., 2003;Walling, 2012).
Monitoring SSC and sediment transport is a challenging task owing to the strong spatial and temporal variability in riverine systems and the resulting uncertainties of load estimates (Morehead et al., 2003;Walling & Webb, 1985).Spatial variability of suspended sediment is present on different scales, including (a) river cross-section with vertical and lateral gradients (Lamb et al., 2020;Lupker et al., 2011) leading to unquantified bias regarding the upscaling of point measurements to cross-sections (Haimann et al., 2014), (b) longitudinal differences (Güneralp & Rhoads, 2008) due to sediment deposition and remobilization along the flow line (especially in tidal and estuarine environments) (Nowacki et al., 2015), (c) catchment scale variability due to spatial variability through surface runoff and soil erosion (Milliman & Farnsworth, 2011b).Even though spatial and temporal variability are linked, in the remainder of this study we focus on temporal variability, while spatial variability will be addressed in a forthcoming paper.For many rivers, the majority of the sediment load (up to 90% of the annual load) is transported in just a fraction of time (sometimes less than 10%), thus quantifying temporal variability is essential (Frings et al., 2019;Hoffmann et al., 2017;McKee et al., 2006;Meade, 2010;Syvitski et al., 2000).This variability is even more pronounced in small mountainous river catchments (Battista et al., 2022) or (semi-)arid environments (McKee & Gilbreath, 2015) where interannual variability at single monitoring stations greatly challenges the estimation of representative annual loads.The quantification of the variability of sediment transport and discharge as function of flow and load flashiness from flow duration curves has been established (Meybeck et al., 2003;Moatar et al., 2013Moatar et al., , 2020)).Moatar et al. (2020) quantify the variability of transport and flow by the proportion of flow and load being discharged during a certain percentage of time (e.g., the highest 2% of flow and sediment transport).
Traditionally, suspended sediment in rivers is monitored using manual water samples, which is not only labor intensive, time consuming, and hazardous during flood events but also lacks comparability, consistency, and often high temporal resolution (Czuba et al., 2015;Meybeck et al., 2003;Morehead et al., 2003).This becomes particularly evident in rivers with strong hysteresis effects between suspended sediment transport and discharge.In these rivers, SSC varies an order of magnitude for any given discharge.These differences lead to significant uncertainties of annual suspended sediment loads if data acquisition is infrequent and misses short impulses of sediment transport (J.R. Gray & Gartner, 2009;McKee & Gilbreath, 2015;Meybeck et al., 2003;Skarbøvik & Roseth, 2014).Resolving hysteresis patterns using infrequent manual sampling is challenging, but surrogate sensors with high temporal resolution show clear benefits to capture hysteresis at various temporal scales, reduce the amount of manual samples, and thus reduce monitoring costs, including sample collection and analysis (J.R. Gray & Gartner, 2009).
Surrogate sampling methods utilize indirect and/or non-intrusive sampling techniques and instruments.New (hydrological) instruments extend data acquisition by means of optical (OBS) and acoustic (backscatter) (ABS) measurements (Gartner, 2002;Gay et al., 2014; J. R. Gray & Gartner, 2009;Sirabahenda et al., 2019), satellite remote sensing (Chen et al., 2010;Wang & Lu, 2010), and laser diffraction for in situ particle size distribution (Czuba et al., 2015).Although surrogate sensors increase comparability through standardization of data, continuous calibration is necessary to translate the surrogate measurement (e.g., turbidity) to the target parameter (e.g., SSC).Uncertainties in calibration are induced by a multitude of factors affecting the surrogate measurement.For instance, turbidity is primarily controlled by SSC, but grain size, density, organic matter, and color may strongly affect the SSC-turbidity relationship, thus turbidity may strongly vary for a given SSC (Czuba et al., 2015;Hoffmann et al., 2017;Mao et al., 2019;Ziegler et al., 2014).High organic matter during warm summer months strongly influences the flocculation of the suspended sediment load and, therefore, affects the grain size of suspended particles and thus introduces bias on turbidity-SSC calibration (Hoffmann et al., 2020;Lamb et al., 2020).Furthermore, uncertainty in OBS and ABS records result from the quality of these records, which is limited through measurement errors due to harsh natural environments (i.e., fouling of turbidity probes, disturbances of light and acoustic path due to floating matter, etc.) (Czuba et al., 2015).Nonetheless, high-frequency surrogate sampling shows distinct advantages over manual sampling mainly based on consistency and continuous data acquisition (J.R. Gray & Gartner, 2009).
Despite recent developments, many monitoring stations still rely on traditional monitoring approaches with insufficient temporal sampling intervals for SSC.At the majority of monitoring stations, discharge (Q) is measured 10.1029/2022WR032628 3 of 19 (at least) on a daily sampling interval and can be used to establish sediment rating curves to predict suspended sediment yields based on a power law: with site specific rating coefficient a and exponent b and error term ε (Hoffmann et al., 2020;Horowitz, 2003).a resembles the SSC at Q = 1.ε is a normally distributed random error, associated with, for example, sampling schemes and analyses as well as natural processes (Cohn et al., 1992).ε is calculated to ensure reliability of the regression and characterize the variability along the trend line in the suspended sediment rating curve (Warrick, 2015).The rating exponent b determines the steepness of the rating line and is often interpreted as the reactivity of the river system to changes in Q (Hoffmann et al., 2020;Syvitski et al., 2000); rivers with a steep rating relationship (i.e., large b-values) are considered to be more reactive rivers with strong changes of SSC in response to changing Q (Moatar et al., 2013;Warrick, 2015).Thus, rivers with large b-values show a flashier regime (i.e., more pronounced transport of suspended sediment during flood events) than rivers with lower b-values but similar discharge regime (Meybeck & Moatar, 2012).The range of exponent b has been assessed in previous studies (Hoffmann et al., 2020;Syvitski et al., 2000) and is connected to climate characteristics (i.e., b = 0.2, arid to b = 2.5, humid, temperate) as well as catchment characteristics, such as steepness and basin size (Moatar et al., 2020;van Dongen et al., 2022).Many measurement stations show a more complex rating-relationship than a simple power law as expressed by Equation 1.For instance, variations in exponent b have been addressed using lowess (locally weighted scatter smoothing) and NLS (non-linear least square) regressions, as well as breakpoints to segment the rating curves for various discharges to provide a better fit compared to a single power law covering the full range of observed discharges (Hoffmann et al., 2020;Meybeck & Moatar, 2012;Raymond et al., 2013).Walling and Webb (1985) conclude that the effectiveness of load estimates can be evaluated based on their level of accuracy (comparison with reference load) and precision (scatter of individual load estimates) for different sampling schemes and frequencies.This concept has been applied to various data sets since then (e.g., Horowitz, 2003;Moatar et al., 2013;Moatar et al., 2020).Especially, Phillips et al. (1999) investigated the effect of sampling frequency from 22 calculation methods on annual load estimates from 15-min measurements.This study relates sampling frequency and imprecision to basin size; however, basin scale ranges only from <1,000 km 2 to <4,000 km 2 (Phillips et al., 1999).Raymond et al. (2013) apply a similar approach to a much larger data set and a variety of basin sizes but lack high resolution.Based on daily samples, they establish a relationship between rating exponent b and load flashiness.Comprehensive studies regarding the effect of rating parameters, their link to catchment characteristics and temporal variability of suspended sediment transport exist (Moatar et al., 2020;Raymond et al., 2013;Warrick, 2015).However, to our knowledge, no sensitivity analysis of rating parameters about temporal variability of suspended sediment transport has been applied to an existing data set with high frequent (15-min) measurements of turbidity/SSC and a wide range of catchment sizes (<1,000 km 2 to >100,000 km 2 ) and characteristics (steep alpine to large anthropogenically impacted waterways).
We address three main issues to contribute to reduce uncertainties in (annual) load estimates and to refine (existing) moni toring approaches: 1. Based on high-resolution time series of SSC, we assess the impact of the sampling interval on the uncertainty of annual loads to derive the optimal sampling interval.Therefore, we utilize a high-resolution (15 min) reference SSC data set and gradually subset it to lower sampling frequencies.Based on a bootstrap approach, uncertainty for annual load estimates is determined.2. We use SSC stations with different degrees of Q-variability to link the optimal sampling interval with different SSC-variability.We define this variability as the time necessary to carry either 80% of the discharge (Tw 80 %) or sediment load (Ts 80 %) and establish a relationship with Ts 80 % and uncertainty in annual load estimates.3. We perform a sensitivity analysis to assess the impact of sediment rating parameters on the uncertainty of annual loads and link the rating parameters to the variability of sediment load (Ts 80 %).

Sampling and Measurement
Here, we analyze time series data for Q and SSC from four monitoring stations located along the rivers Rhine, Moselle, Ilz, and Ammer in Germany (Figure 1).The first two stations are maintained by the German Waterways and Shipping Administration (WSV) and the Federal Institute of Hydrology (BfG).The stations at the Ilz and the Ammer are managed by the Bavarian Environment Agency (LfU).The selection of the stations is based on their topographic and hydrological characteristics as well as the availability of high-resolution 15-min sensor data.Surrogate records are retrieved by turbidity sensors (with optical backscatter sensors, OBS, type HACH Solitax ts-line sc).The OBS sensors measure turbidity between 0.001 and 4,000 NTU based on the ISO 7027 standard (using light scattering at a 90° angle of infrared light).The sensors are mounted to a swimming dock (KAA, COC) or attached to a folding rack (KAL, WEI) and are submerged approx. 1 m below surface (Hillebrand et al., 2015;LfU, 2012).All OBS sensors are equipped with automatic wipers and cleaned regularly to reduce bias induced by biofouling.Calibration of the sensors with turbidity standards is conducted at least once a year (Hillebrand et al., 2015).The quality of turbidity data is controlled in terms of the "smoothness" and consistency of successive measurements.In cases of extreme variability (e.g., disturbances of the light path caused by floating matter and strong turbidity drifts caused, for instance, by biofouling), turbidity values are deleted, strongly smoothed, or interpolated to avoid unrealistically high turbidity values (turbidity of the four stations is typically <20 FNU during discharge < average discharge, increasing up to several 10 2 FNU during floods and scarcely exceeding 1000 FNU).The overall number of missing values (due to maintenance or technical issues) is low (Table 1).At KAA and COC sampling for OBS-calibration is done using two 1 L-bottles (Hillebrand et al., 2015).At KAL and WEI sampling is done using a 0.5 L bottle (LfU, 2012).Calibration samples for all stations are taken at approx.weekly sampling interval (see Table 1) close to the sensor.Sampling is increased during high discharge (HD) events and can be reduced during low/medium water stages (Hillebrand et al., 2015).For each station, SSC is calculated for each 15-min turbidity measurement using a calibration function SSC = a • TR b , where SSC is given in mg L −1 and TR in FNU, a and b are derived from an NLS-regression analysis using all calibration data for each station.Station data and in-depth background information, along with the amount of surrogate and manual samples are displayed in Table 1 and Table S1 in Supporting Information S1.
The station KAA is 2 km upstream from the confluence of the Rhine and Moselle.The contributing drainage basin is highly heterogeneous concerning geomorphological, climatic, and anthropogenic (e.g., embankments, barrages, and groins) characteristics (Frings et al., 2019;Hoffmann et al., 2017).The station COC is located in the terraced landscape of the Rhenish Massif (Cordier et al., 2006).The river Moselle has a pluvial hydrological regime with rapid responses of discharge changes to single rainfall events (Sutari et al., 2020).The rivers Ammer and Ilz have a distinctively smaller catchment size and contrast clearly with the previous two.The Ilz is characterized by a wild, near-natural riverscape connecting the diverse habitats of the Bavarian Forest and the Danube Valley.The Ammer originates in the Alpine Foreland, with strong glacial influences in the headwater.In the lower course, anthropogenic impact altered the river Ammer (i.e., by course regulations, embankments, and hydropower installations), leading to a heterogeneous catchment (Suske & Schnetzer, 2017).
The present selection of stations allows the comparison of different discharge regimes and variability, which is driven by the contributing catchment area and their different climatic and geomorphic conditions.The Rhine River represents the most sluggish riverine system in this selection and is characterized by the lowest flow and load flashiness.The smaller, low-mountainous (at station KAL) or alpine discharge regimes (station WEI) have a higher variability in discharge and suspended sediment transport (Table 1) due to more rapid responses to hydrological changes.

Estimation of Annual Loads for Various Sampling Intervals
To evaluate annual sediment load estimates, it is necessary to know the impact of the sampling interval.The impact of the sampling frequency toward load estimations is studied in various publications (Horowitz et al., 2015;Moatar et al., 2020;Richards & Holloway, 1987;Skarbøvik et al., 2012).Findings agree that an increase in sampling frequency increases the accuracy (i.e., reduces the systematic bias) and precision (variability of errors) and, therefore, decreases uncertainties in annual load estimations (Walling & Webb, 1985).
Our approach uses high-resolution turbidity measurements, with a 15-min sampling interval to calculate reference loads (in kt yr −1 ) based on the following equation: where Q i and SSC i are instantaneous discharge and suspended sediment concentration measurements given in m³ s −1 and mg L −1 , respectively.k s is a conversion factor considering the sampling interval and measured units.In our case, k s = 60 × 15 × 10 −6 .Annual loads calculated from instantaneous 15-min measurements are used as reference loads to evaluate the effect of (lower) sampling frequencies on annual loads (Equation 2).Therefore, we randomly selected measurements from each day, week, and month to represent the following sampling intervals: 1 sample per day, 3 samples per week, 1 sample per week, 3 samples per month, and 1 sample per month, resulting in 365, 156, 52, 36, and 12 "measurements" per year (different criteria for random selection are tested in Figure S1 of Supporting Information S1).For each sampling interval with n measurements per year, the annual load Q s (in kt yr −1 ) was then estimated using: This method is similar to the interpolation method 2 in Walling and Webb (1985), which is based on instantaneous SSC i and Q i values, and is chosen because it is mathematically similar to Equation 2. Coefficient 0.0864 accounts for conversion to kt yr −1 (J.R. Gray & Simões, 2008).The only difference between Equations 2 and 3 is the different sampling intervals, which is accounted for by n (number of measurements per year).Thus, the reference load (Q s,ref ) calculated from the 15min-sensor data (Equation 2) is directly comparable to annual loads calculated with a lower sampling interval (Equation 3).

Table 1 Summary of Station Data
For the random subsets with sampling frequencies >1/day, we added a second calculation method of annual loads.
Here we utilize mean daily discharge data to evaluate if the addition of (daily) discharge data improves annual load estimations from infrequent SSC sampling.Based on the mean daily discharge we interpolated SSC values to a daily interval based on a sediment rating analysis using a lowess-regression.Smoothing (lowess)-regression has the major advantage that it considers more complex SSC-Q-relationships than constraining a power law.In the case of daily interpolated SSC values, calculations of annual loads are done using Equation 3, with n = 365.25.Consequently, both methods are directly comparable and the impact of daily discharge data on sediment rating can be assessed.
The random selection of n measurements from the high-resolution 15-min time series was replicated 100 times, which results in 100 Q s -estimates for each sampling interval per year and station to evaluate the variability of Q s .This bootstrap procedure systematically generates synthetic, low-frequent data sets.The respective Q s -values for each year were normalized to the Q s,ref derived from the 15-min time series: We analyzed the distributions of Q s,n using the bias (e.g., deviation from 1) and the precision (e.g., standard deviation of Q s,n labeled as ∆Q s,n ) for each sampling interval to evaluate the most infrequent sampling where ∆Q s,n is smaller than a predefined variability/uncertainty (e.g., 10 or 20%).The workflow can be applied to any existing data sets and time series to quantify the uncertainties of annual loads (Horowitz et al., 2015;Morehead et al., 2003;Skarbøvik et al., 2012).

Quantification of Flow and Sediment Load Exceedance Time
Here, the exceedance time represents the percentage of time during which 80% of the annual discharge or sediment load is transported.Based on Meybeck et al., 2003 we use the notations Tw 80 % (time necessary to transport 80% of the water volume) and Ts 80 % (time necessary to transport 80% of the suspended solid flux).Under the assumption that maximum loads are monitored, monitoring must cover at least the bulk (in this case 80%) of discharge and sediment load transported to estimate annual sediment loads with an uncertainty of <20%.In accordance with Moatar et al., 2020, we assume that Tw 80 % and Ts 80 % give a first-order approximation of the minimum required sampling time.We used a flow duration curve with discharge and sediment loads ordered descending from highest to lowest and cumulated according to their rank to estimate Tw 80 % and Ts 80 % (Phillips et al., 1999).We calculated all exceedance times based on the 15-min reference time series of Q and Q s to avoid bias from the sampling interval.The relationship of Tw 80 % and Ts 80 % with ∆Q s,n was analyzed for each station and sampling interval (Section 2.2.).This approach is similar to Moatar et al. (2020) but applies high-resolution 15-min sensor data sets as a reference (instead of daily values) and synthetic low-frequent data sets with fixed sampling intervals (Section 2.2.).

Assessment of Sediment Rating Parameters on the Uncertainty of Annual Sediment Loads
Traditionally, annual sediment loads at gauging stations with a limited number of SSC-measurements are derived from the sediment rating of SSC versus Q using a power law according to Equation 1.Since SSC varies due to a multitude of environmental factors, SSC estimates based on sediment rating relying only on Q are expected to yield large uncertainties that are represented by the error term ε in Equation 1 (Cohn et al., 1992;Ferguson, 1986;Horowitz, 2003;Walling & Webb, 1981;Warrick, 2015).Accordingly, the variability of SSC derived from sediment rating is positively linked to (a) the variability of Q, (b) the magnitude of the rating exponent b, and (c) the error term ε.First, we used NLS-regressions of SSC and Q to derive the optimal fit of a and b, and to estimate ε using the distribution of the residuals in the NLS rating (Koch & Smillie, 1986;Warrick, 2015).NLS is preferred to log-transformed linear regression due to biases introduced to the log transformation (Hoffmann et al., 2020).Lowess-regression might perform better in terms of predictions of SSC as a function of Q for more complex SSC-Q-relationships (Section 2.2.), however, lowess-regression does not allow for estimation of rating parameters a and b.Based on the power law rating of SSC and Q, we argue that two river systems having the same Tw 80 % may strongly differentiate in terms of Ts 80 % if the sediment rating exponent b differs.The river system with the higher b-exponent should be characterized by higher Ts 80 %, if Tw 80 % is similar (Moatar et al., 2013(Moatar et al., , 2020)).
We varied optimal b (b opt ) with a factor of 0.5, 0.8, 1.2, and 1.5 for each station to cover the range of b mentioned by Syvitski et al. (2000) and to ensure comparability of results.In this case, a is fixed to the optimal a (a opt ) derived from NLS-regression.The same procedure was repeated by varying a opt with a factor of 0.5, 0.8, 1.2, and 1.5 with b fixed at b opt for each station to evaluate the effect of the rating parameters on Ts 80 % and the uncertainty of the annual load.This results in eight synthetic time series for SSC for each station.The sediment load for each synthetic time series is based on the same discharge for each station (and thus for the same Tw 80 %), but utilizing different a and b scenarios leads to varying values of Ts 80 %.For changes in error term ε we used a randomly selected ε from the regression analysis to calculate the synthetic time series for SSC.The randomly selected errors were multiplied with the same factors (i.e., ε*0.5, ε*0.8, ε*1.2, and ε*1.5) as a and b to evaluate the effect of different magnitudes of the error term.For large factors, it is likely that some (negative) errors are larger than the randomly sampled SSC-values, leading to negative sediment concentrations.In the case of negative SSC-values, we set the corresponding error term to 0 to avoid the calculation of negative loads (Asselman, 2000).
Further, we investigate how rating-exponent b changes with different sampling schemes.Here we determine how many samples/which sampling scheme is necessary to reproduce b to estimate Ts 80 % from b, and thus evaluate and reduce uncertainties in load estimates.Therefore, the data set is randomly sampled, and daily, weekly, and monthly averages are calculated along with subsets consisting of sub(daily) to monthly sampling, which are widely used sampling schemes.Furthermore, we used subsets with a fixed number (30) of random (RD) samples and included HD samples, as this is established to decrease uncertainties in load estimates further (Horowitz et al., 2015).The calculations with the respective subsets were repeated 100 times.

Impact of the Sampling Interval on Annual Sediment Loads
The accuracy and precision of Q s,n -estimates based on 100 randomly selected subsets show three general trends (Figure 2a).First, the accuracy decreases for each sampling interval at each station.The decrease is smallest at the river Rhine at station KAA and largest at the Ammer (station WEI), with intermediate decreases at the river Moselle (COC), and river Ilz (KAL).Thus, the decrease in accuracy shows a negative relationship with the contributing catchment area (Table 1).
Second, the precision decreases for each station with a decreasing sampling frequency.The precision is presented relatively to the reference load, which is derived from the 15-min time series (see Section 2.2).The precision for the river Rhine (red) is <10% up to 1/week, indicating that one sediment sample per week is sufficient to estimate the annual suspended sediment load within a 10% range over a period of 21 years.Even monthly sampling yields almost acceptable results in the <20% error range for the long-term average load.At the river Moselle (orange), sampling intervals <3/week show acceptable results (imprecision <10%).For the Ilz and the Ammer, our results indicate that daily sampling is required to reduce the uncertainty (i.e., imprecision) of annual load estimates to <10%.The imprecision for the smaller rivers (Ilz and Ammer) is likely underrepresented for the 1 sample/month-interval due to the missing representation of the very punctuated flood events within the 100 randomly selected subsets.As medium to low discharges occur more often and the median normalized bootstrap sediment load is smaller than the reference sediment load, the random sampling of high sediment transport events leads to an increased occurrence of outliers with a normalized sediment load >1.0.Thus, we observe that the impact of high sediment transport on annual sediment loads is far more pronounced than low sediment transport events, suggesting a skewed distribution of modeled sediment loads.Increasing underestimation and imprecision caused by the discharge regime, from low (Rhine) to high (Ammer) flashiness is strongly visible (Figure 2a).Expected bias due to different lengths of time series (21 years at KAA and 6 years at WEI) can be neglected, as the analysis has been conducted with subsets consisting of the same length (6 years) (see Figure S2 in Supporting Information S1).Independent of the length of the time series the overall pattern with increasing underestimation from low to high flow remains.Third, Q s,n decreases with sampling interval, especially for the stations KAL and WEI, leading to a loss of accuracy with increased underestimation, presumably due to the missing detection of floods, which strongly contribute to annual Q s .(see Figures S3-S5 in Supporting Information S1 e.g., hydrograph and comparison of cumulative loads for hydrological year, and example discharge peak lengths).For the KAA and COC stations, this pattern is only evident with very long sampling intervals (one sample per month).The underestimation of annual Q s for KAL and WEI is visible for sampling intervals >1/day.Although the contributing catchment areas are comparable in size, the much stronger underestimation of the suspended annual load of the river Ilz compared to the Ammer river indicates strong variations in catchment characteristics and discharge regime, likely connected to their SSC ∼ Q-rating parameters.
The accuracy and precision estimates using the sediment rating approach with daily discharge data used to interpolate SSC on a daily interval from the randomly selected subsets with variable sampling intervals (as applied in Figure 2a) is shown in Figure 2b.The application of daily discharge data and a sediment rating approach increases the precision and accuracy of the annual suspended sediment load compared to the approach without daily mean discharge data.Furthermore, the tendency to underestimate sediment load using long sampling intervals slightly decreases compared to load estimates without daily discharge data.However, accuracy and precision still exceed the tolerable range (±20%) for stations COC, KAL, and WEI.Thereby, highly variable discharge regimes require quantification of accuracy and precision, especially given that annual variability might be higher during years with high magnitude floods compared to long-term uncertainties (see Figure S6 in Supporting Information S1, for annual variability of sediment load per sampling interval for station COC).

Impact of Flow and Sediment Load Exceedance Time on Uncertainty of Annual Q s
Figure 3 visualizes the cumulative distribution of Q and Q s for each station covering the full monitoring periods using daily mean values calculated from the 15-min sensor time series (compare Table 1).The cumulative distributions of the sediment loads are characterized by steeper initial curves (at low cumulative times) indicating a stronger sensitivity of sediment transport on large events compared to cumulative discharge (Figure 3).This becomes particularly evident for the stations WEI, KAL, and COC which have smaller catchment areas (<30,000 km 2 ).Ts 80 % of the Ammer river at WEI ranges from 1% to 3.2% (mean 2.1%, 8 days).At KAL, Ts 80 % (12.2%, 42 days) is comparable to the Moselle at COC (12.9%, 44 days), despite the distinctively smaller catchment size (Table 1).At the same time, annual variation for station KAL ranges from 3.2% to 17.8% (see Figure S7 in Supporting Information S1 for annual Ts 80 %), indicating strong variations between single years.This variation is in the same order of magnitude as for station COC; here, Ts 80 % ranges from 5.4% to 21.8%.Thus, single years show high variability in sediment transport with punctuated flood events at the Moselle.A much higher mean Ts 80 % (47.1%, 173 days) is noticeable for the station KAA at the river Rhine; here, the annual Ts 80 % ranges from 30.7% to 60.3%.While Ts 80 % tends to increase with contributing catchment size, Tw 80 % follows no such simple relationship but appears to be confounded by other processes (Figure S8 in Supporting Information S1).
The relationship between exceedance times (Ts 80 % and Tw 80 %) and the uncertainty of the annual sediment load (∆Q s,n ) derived for sampling intervals of one sample per week and three samples per month is shown in Figure 4. ∆Q s,n shows an exponential decrease with increasing Ts 80 % (Figure 4a).Ts 80 % in this plot is calculated based on high-resolution (15 min) SSC data sets.Since SSC data sets with 15-min intervals are seldom available, we assessed the relationship between Tw 80 % (based on high-resolution, daily gauging data) and ∆Q s,n in Figure 4b.However, Tw 80 % does not indicate a comparable relationship with ∆Q s,n as shown for Ts 80 % (Figure 4a).This is likely due to differences in sediment rating of each station (represented by b-values of NLS-rating curve) (Moatar et al., 2013(Moatar et al., , 2020)).For Figure 4c we utilized low-resolution SSC-datasets with a sampling interval of 1/week and 3/month and interpolated SSC to daily SSC values (compare lowess regression in Section 2.2) based on sediment rating using lowess regression and daily discharge data (Figure 4c, see Figure S9 in Supporting Information S1 e.g., of lowess-regression).Ts 80 % values, which are derived from infrequent data interpolated to daily SSC, show a similar negative exponential relationship with ∆Q s,n seen in Figure 4a (using 15-min SSC-values), indicating that Ts 80 % derived from interpolated infrequent SSC-data can be applied to predict the uncertainty of calculated annual loads.
The relationship between ∆Q s,n and Ts 80 % for all considered sampling intervals is shown in Figure 5.For any given sampling interval, ∆Q s,n derived from instantaneous (infrequent) measurements (solid lines in Figure 5) is almost always larger compared to ∆Q s,n values derived from lowess-interpolated daily SSC-values (dashed lines in Figure 5).The shaded areas at the bottom of Figure 5 are related to the sampling interval that is sufficient to obtain reliable annual load estimates with ∆Q s,n < 20%.For monthly sampling we observe that this interval will not lead to annual load estimates with an error <20% (Figure 5).If daily discharge data is unavailable (and interpolation of SSC cannot be conducted) the threshold values are increased (solid regression curves) for every sampling interval, thereby requiring increased sampling frequency to obtain similar results in annual load estimates.

Effects of Sediment Rating on Uncertainty of Annual Loads
The effect of the sediment rating curve on the calculation of annual suspended sediment loads is analyzed for all stations with a sensitivity analysis of the rating parameters (as described in Section 2.4). Figure 6 shows the sediment rating for station COC and the rating curves for the modified rating exponent b (Figure 6a) (with a fixed rating parameter a).As expected, the sensitivity analysis of the b-exponent leads to the following observations: an increase in b is indicated by steeper rating lines and leads to strong overestimations of SSC compared to NLS fit with b opt .In contrast, a decrease in b leads to more gentle rating lines and strong underestimations (much lower SSC at the same discharge) regarding the optimal NLS-regression.Changes in rating parameter a (with a fixed rating exponent b) result in a parallel upward (for increasing a) and downward (for decreasing a) shift of the rating line (Figure 6b).As indicated in Figure 6, changes in the regression lines are much more pronounced for changes in b compared to changes in a.This is also expressed by the variation of the root mean square error of the regression analysis for all stations (see Table S1 in Supporting Information S1).However, here, we are not so interested in whether the variations of the rating parameters reflect the true measurements, but instead use the sensitivity analysis to generate synthetic SSC time series with variable a and b that are applied to the same discharge data.As b is an exponent, using the same factors (i.e., 0.5, 0.8, 1.2, and 1.5) as for a and ε, the comparison might be skewed.However, we chose the range of factors to cover the range of b-values (approx.0 to 2.5) present in rivers on a global scale (Syvitski et al., 2000).As shown in Figure 7, increases in the b exponent result in a consistent decrease in the precision of annual loads at all sampling intervals (with respect to Q s,ref , which is calculated using the 15-min sampling interval).In contrast, changes in a and ε do not lead to significant changes in the uncertainty of the annual load.
Since rating exponent b is related to the variability of sediment transport, we compared b with Ts 80 % (Figure 8, Table 2).Ts 80 % decreases for all stations if suspended sediment loads are calculated with increasing b using the same underlying discharge time series.Our results indicate a much faster decrease of Ts 80 % with b for smaller catchments.This is also evident from the slopes of the linear regression lines.In general, the slope for the larger catchments (COC, KAA) is smaller than for the smaller catchments (WEI, KAL).However, at the same time, the slope at COC is smaller than at KAA.Further, we see a strong increase in Ts 80 % for station WEI compared to the reference time series, which is based on 15-min sensor data (Table 2).Here, Ts 80 % derived from NLS-regression is more than six times higher than Ts 80 % from the sensor time series.This is likely attributed to the fit of the rating curve (Table 1), as we see that ε at WEI (160.2) is three times higher than at station KAA (52.7).In comparison, the increase in Ts 80 % for station KAA from the reference time series (47.1%) to Ts 80 % with sediment loads calculated based on the optimal NLS-rating curve (NLS opt ) (50.4%) is only 3.3%.
In contrast to changes in the rating exponent b, changes in the rating coefficient a and the distribution of the error term ε yield no or only marginal changes in Ts 80 %, respectively.Marginal changes of Ts 80 % due to the variation of ε might be related to the fact that we set ε to 0 if negative SSC-values would have been generated (see Section 2.4).
Due to the key control of the sediment rating exponent b on the calculation of the Ts 80 % and ∆Q s,n , we compared how different averaging and sampling schemes affect the estimation of the rating exponent b (Figure 9).For all stations, we see a distinct difference between rating exponent b from low-frequent manual calibration samples and the use of the full, high-resolution sensor data set.For example, b-exponent from the 15-min data is higher at KAL (suggesting higher variability or rather a tendency that more calibration samples are taken during lower discharge).The use of 30 random samples (RD) to establish a robust rating relationship yields better results with increasing catchment size.Additional high-discharge samples (HD) only yield better results for station COC.RD averaging schemes (utilizing daily, weekly, and monthly means) lead to comparable high accuracy with regard to the entire calibration sample data set.RD sampling schemes, ranging from 12 samples/day to 1 sample/month lead to increased imprecision with reduced sampling frequency.Especially at station KAL this trend is almost linear, as the median b-exponent from monthly random sampling is 0.89 and daily random sampling yields a b-exponent of 1.15.Compared to station KAA, the accuracy of all RD sampling schemes is in the same order of magnitude, with only minor changes and slightly higher imprecision with monthly sampling.Similar results are observed for station COC.At station WEI (Figure 9d) imprecision is high, especially with the use of high-discharge samples; the median b-exponent increases to almost 2, thus indicating even more variability compared to the entire calibration data.4c).Solid lines are low-frequent "synthetic" time series from 15 min turbidity data (as plotted in Figure 4a).

Uncertainties Induced by the Sampling Interval
The impact of the sampling interval has been assessed in previous publications (Horowitz et al., 2015;Moatar et al., 2006;Skarbøvik et al., 2012).It is well stated, that an increase in sampling frequency is followed by improved accuracy (Hoffmann et al., 2017;Horowitz et al., 2015).As shown in Moatar et al. (2020), the overarching goal remains to identify the best-suited sampling interval as a tradeoff between accuracy and the work load accompanying manual sampling.We utilized an easily replicable bootstrapping approach to test multiple sampling intervals with varying sample schedules and sampling frequencies.In contrast to previous studies, we utilize load estimates derived from high-frequency (15-min) turbidity measurements, which serve as the reference for annual load estimates.Similar approaches have been conducted and show the necessity of hydrological-based sampling, with increased sampling frequency during floods and reduced frequency during low water (Horowitz, 2003;Horowitz et al., 2015;Walling & Webb, 1985).Based on the calculation of rating exponent b, we see that hydrological-based sampling might not work for small, highly variable catchments if the overall number of samples is too small (Figure 9).Here, it seems that averaging and/or increasing the sampling frequency (e.g., using turbidity sensors) will enhance the approximation of b compared to b derived from the full time series.
The results presented here are limited to the range of discharge regimes (large, sluggish to rapidly changing) and climate conditions (wet mountain and mid-latitude climates of Central Europe) represented by the selected river systems in this study.However, the presented approach, which links sediment exceedance times (Ts 80 %) with uncertainty estimates of annual loads, can be used for each gauging station that shows a power law rating between suspended sediment concentration and discharge.
Challenges might occur in translating this approach to gauging stations with complex or non-steady rating relationships, for instance, for rivers that show seasonal variations of the rating coefficients (e.g., in arid or tropical regions), implying the need for seasonally adopted sampling intervals (Moatar et al., 2020).Thus, manual sampling and maintenance of surrogate sensors might be reduced partially during low flow and increased in the flood season (Rottler et al., 2021).
Thereby, the utilization of surrogate sampling remains the best option to obtain reliable annual load estimates for rivers with complex sediment transport regimes (J.R. Gray & Gartner, 2009).While surrogate sensors show clear benefits in terms of sampling frequency, they require continuous maintenance and sound calibration to assure sufficient quality of the monitored time series.Uncertainties induced through (bio)fouling, organic matter, and disruption of the backscatter signal have to be identified and the data set checked for reliability (Hoffmann et al., 2017).Accordingly, monitoring gauges should be checked regularly, especially in smaller catchments with higher variability, as flood lengths are often short (Figure S5 in Supporting Information S1) with high impact and sediment transport.At the smaller catchments, missing/incorrect data is recorded much more often than at the Moselle and the Rhine (see Table 1).Despite the lower variability of large river systems, such as the Rhine and Moselle, punctuated flood events may not be resolved for a single year based on infrequent manual samples.Therefore, the use of OBS-sensors in combination with manual water sampling (for the estimation of the suspended sediment concentrations) provides the best means to estimate a reliable suspended sediment load in rivers.At the global scale, high-resolution SSC data sets remain scarce, but discharge data is generally available at a sufficient temporal resolution for most riverine systems.In these cases, discharge-based sediment rating of SSC is an often used approach to interpolate missing SSC data (Asselman, 2000; J. R. Gray & Simões, 2008;Moatar et al., 2013Moatar et al., , 2020;;Warrick, 2015).Here we applied lowess-regression to interpolate SSC in random subsets of our high-frequency time-series to represent infrequent "sampling intervals."We argue that lowess-regression is a better tool for predicting/interpolating SSC based on daily Q data instead of a power law regression (Hoffmann et al., 2020), which might not be suitable for all river systems, or rating curve separation (Raymond et al., 2013), which requires larger data sets.Interpolating SSC values based on daily discharges for the stations with catchment sizes <1,000 km 2 (Figures 2 and 3) does not increase the accuracy and precision of annual load estimates significantly.This might result from the averaging effect of the rating approach, which does not represent the strong scattering of the measured SSC around the rating line (Ferguson, 1986;Hoffmann et al., 2020;Koch & Smillie, 1986;Warrick et al., 2013).However, it enables an approximation of Ts 80 % and by doing so, uncertainties from annual load estimates can be evaluated.
We show that daily SSC data may not be necessary to obtain long-term annual average loads in a ±20% confidence range for large sluggish rivers such as the Rhine (KAA).For these rivers, SSC sampling frequency can be lower than discharge sampling interval if discharge is measured at a high frequency (15-min to daily) and if a sufficient relationship between SSC and Q exist (Moatar et al., 2020).In this case, SSC sampling frequency can be adjusted to longer sampling intervals but with more frequent sampling during large discharge events (Horowitz et al., 2015).However, discharge monitoring yields an additional uncertainty of up to 20% during high flows, which affects load estimates further (Ziegler et al., 2014).Thus, depending on the main question of the user, the sampling interval requires adaptation.For example, annual estimates with high precision require increased sampling during flood events and gapless monitoring, while the estimation of reliable long-term averages should cover a large range of discharges, and the overall number of samples can be reduced.However, site specifications should be considered.

Approaching Variability Through Sediment Load and Flow Exceedance Time
Exceedance times and flashiness indices have been used as reliable measures of the variability of suspended sediment (Meybeck et al., 2003;Moatar et al., 2006Moatar et al., , 2013Moatar et al., , 2020)).Since exceedance times strongly depend on the abundance of flood events, time series over multiple years are required to cover a representative range of discharge and sediment transport, as precision and accuracy increase with an increasing monitoring period (Cheviron et al., 2014).With stations WEI and KAL covering six and seven years of high-frequent monitoring, we assume their station data is representative since Ts 80 % is consistently short (2.11%) over the 6-year monitoring period for station WEI, suggesting annually reoccurring flood events.At WEI, an average Ts 80 % of 2.11% indicates that 80% of the annual suspended sediment load is transported in less than 8 days during flood events.However, for stations COC and KAA, major floods (such as the Rhine floods in 1993 and 1995) are not covered by the monitoring periods, based on the sensor measurements, suggesting the need for longer time periods to cover the full range of potential exceedance times (Disse, 2001).
Annual Ts 80 % for station KAL is consistently larger and ranging between 13.3% and 17.8% (45-65 days for 80%-exceedance) for five out of seven years, indicating medium to minor floods.Single years indicate high flood events at the Ilz (KAL) with Ts 80 % as low as 3.2% (less than 12 days in 2018) and 4.8% (less than 18 days in 2015).Stations KAL and WEI have highly different Ts 80 %-values despite a similar catchment size (Table 1).Even more interesting, Tw 80 % of the Ammer river (station WEI) shows less rapidly occurring floods compared to the Ilz river (station KAL), suggesting that Ts 80 % at the WEI station should be longer than that at the KAL station (Figure 3).Sediment transport in the river Ammer (WEI) is strongly affected by the Alpine topography.In turn, it is characterized not only by storm events, but also by rapid snow melt events leading to flash flood discharges and rapidly increasing and decreasing sediment transport (Gericke & Venohr, 2012).Catchments with alpine topography are generally characterized by steeper rating curves with readily available sediment sources that are accumulated seasonally and (re)mobilized as discharge increases (Vercruysse et al., 2020).This is also true for the Ammer catchment showing the largest rating exponent (b opt = 1.6) of the four catchments, while the Ilz river at KAL shows the smallest rating exponent (b opt = 0.95) of the four stations, along with the highest a-value.
Average Ts 80 % at COC and KAL are almost identical (Figure 3).Despite the large variation in catchment size (Table 1) our findings indicate comparable variability.This might be caused by the comparable short response time to hydro-meteorological changes for the river Moselle (Sutari et al., 2020), which is evident from the lowest Tw 80 % (42%) of all four stations (Figure 3) and a steeper sediment rating (b = 1.2, a = 0.02) compared to the KAL station (b = 0.95, a = 1.87).Short Tw 80 %-values for the river Moselle appear to be strongly impacted by its topography.The pluvial flow regime in combination with snow melt during the winter months leads to annually reoccurring high-discharge events (Rottler et al., 2021).In comparison, discharge at station KAL (with a significantly smaller contributing catchment) is more evenly distributed throughout the year (Tw 80 % = 53%).This goes along with a more homogeneous distribution of precipitation between summer and winter, although overall precipitation is double (LfU, 2006).Additionally, both catchments vary strongly in catchment characteristicsriver slope for the river Ilz is significantly higher than the slope of the river Moselle (Gericke & Venohr, 2012;Viroux, 1997).
Regarding the relationship between Ts 80 % and Tw 80 %, we observed that with catchment size, Tw 80 % becomes a better approximator of Ts 80 % (Figure S8 in Supporting Information S1).This is likely caused by differences in sediment storage and supply limitations in the four catchments.High sediment transport events at KAL and WEI occur in a very short time (sometimes less than one day).Sediment supply in these (low) mountainous areas is intermittent, and landslides, contributing strongly to sediment transport, are mainly driven by soil moisture (which is increased due to overall higher precipitation) (Battista et al., 2022).Battista et al. (2022) also connect supply limitation in (pre)-alpine rivers to scatter in rating curves, whereas higher scatter shows higher variability and less supply limitation.For the larger catchments, we can assume that supply storage due to gentler slopes and less precipitation is more dominant, and remobilization requires much larger discharges, approximating Ts 80 % and Tw 80 % further.To summarize, catchment size is not the only controlling factor regarding variability and uncertainties in annual load estimates.Topography and land use, which both control the availability of sediments during floods and thus the sediment rating, are crucial factors controlling Ts 80 % and thus the uncertainty of annual load estimates (Moatar et al., 2020).Thereby, the ±20% accuracy may need modification based on the complexity of the watershed and the hydrography of the system.

Uncertainties Induced by the Temporal Variability
Here we use Ts 80 % as an indicator of uncertainty estimate of annual loads, since we argue that at least 80% of the annual load should be captured to estimate loads with an imprecision of less than 20%.This approach is linked to Moatar et al. (2020), who established a relationship between load and flow flashiness with export indicators (i.e., the value of b high from the segmented rating curve) from a large data set with infrequent SSC samples and frequent discharge data.However, we argue that Ts 80 % is given in units of time and thus directly linked to the sampling interval, which is also given in units of time.Our results indicate that the optimal sampling interval should be much shorter than Ts 80 %.In the case of the Rhine (Ts 80 % = 47% at station KAA) and the Ammer (Ts 80 % = 2% at station WEI), sampling intervals of ≤3/month and 1/day are required to reduce the average uncertainty of annual loads below 20%, respectively.While we used a fixed 20% uncertainty in this study, our approach can be adopted for other uncertainty values depending on the aim of the study/monitoring.
We show that Ts 80 % strongly correlates to uncertainties in annual load estimates (Figure 5a).However, in contrast to Moatar et al. (2006), we find no consistent correlation between Tw 80 % and the uncertainty of the annual sediment load across our stations (Figure 5b).We attribute this missing link to the variability of the sediment rating (expressed by the variability of the b exponent), which affects the sediment load variability for stations with similar discharge variability (see Section 4.4).Further, we see strong variations in variability despite similar catchment sizes (WEI and KAL) which are most likely caused by topography and sediment availability.
Our results imply that the quantification of Ts 80 % enables an assessment of the reliability of annual load estimates.However, low-frequent sampling captures only part of the variability and, therefore, yields higher values of Ts 80 % than high-frequent sampling that captures a larger part of the variability (Figure 5).These findings are expected, but they imply that high-frequent measurements, which are scarcely available, are necessary to estimate Ts 80 %.
Our results show that interpolation of low-frequent sediment samples to daily SSC-values using lowess-regression and daily discharge data (Section 2.3.) is a reliable workaround to estimate Ts 80 %.If we use Ts 80 % calculated by the daily values of lowess-interpolated SSC, we capture the same negative exponential trend (Figure 4c), which is derived from the measured daily SSC (Figure 4a).Thus, our results indicate that high-resolution Ts 80 %-values can be derived from high-resolution discharge data and a sediment rating approach using the lowess regression of low-frequency SSC samples.This approach enables the quantification of uncertainties in annual sediment loads for stations with infrequent sediment samples but high-resolution discharge data (Moatar et al., 2020).
Based on the exponential relationship between uncertainty of the annual load and Ts 80 % for various sampling intervals, we can assess which sampling interval is necessary (based on Ts 80 %) to obtain annual load estimates with an error <20% (Figure 5).Here, our results support findings by Moatar et al. (2020) and expand these by utilizing high-resolution 15-min sensor data as a reference load, which is likely to represent more accurate reference load estimates compared to monthly SSC samples with daily discharge data.Thus, the effectiveness of Ts 80 % as a proxy to estimate the uncertainty of (annual) load estimates is increased.The proposed sampling intervals per Ts 80 % (Section 3.2.)are universally applicable but require that Ts 80 % of a river station be known.Thus, continuous discharge measurements and at least a minimum of 30 samples (Figure 9) should be present to establish an approximate rating curve and derive Ts 80 % from rating exponent b.Then a minimum sampling interval can be derived to obtain reliable annual load estimates and refine an existing sampling scheme.However, our results (Figure 9) also support earlier studies that uncertainty can be significantly reduced if sampling frequency is increased during flood events (Horowitz et al., 2015;Moatar et al., 2006;Skarbøvik et al., 2012).At the same time, we show that sediment rating parameters derived from high-resolution sensor data might yield differences compared to sediment rating parameters calculated from infrequent manual samples.Thus, the variability of a river system might be misinterpreted by rating approaches utilizing only a limited number of manual samples.As our sensitivity analysis of the rating parameter shows, small differences in rating exponent-b, might lead to large differences in (annual) load estimates.Thereby, surrogate sampling with frequent manual measurements is favorable for long-term monitoring networks (J.R. Gray & Gartner, 2009).
Through quantifying the variability of sediment transport, we contribute towards quantifying uncertainties in annual load estimates for less-frequent (>1/day) data sets.Our results for the synthetic low-frequent time series are consistent for all stations in this study.As shown in Figure 2b, sediment-rating-based interpolation of low-frequency sediment samples to daily values did not significantly improve the estimation of annual load compared to the 15-min reference time series.However, the discharge-based interpolation improves the estimation of Ts 80 % (Figure 5) and, therefore, can be used to evaluate the uncertainty of annual suspended load estimates by combining high-frequent discharge data and low-frequent suspended sediment samples (Delmas et al., 2011).Nonetheless, depending on the fit of the sediment rating, the accuracy of Ts 80 % decreases significantly, and an adapted sampling scheme is required.

Quantify the Impact of Rating Parameters
In accordance with previous publications (Hoffmann et al., 2020;Jung et al., 2020;Syvitski et al., 2000;Warrick, 2015), our results indicate a high sensitivity of sediment load estimates with changing rating parameter b while changes of a and ε have a much smaller impact on the calculation of the annual load (Table 2).Regarding the best fit of the NLS-regression rating exponent b is also more important than changes in rating parameter a (Ferguson, 1986;Walling & Webb, 1981).Our results confirm that regional topography (i.e., elevation, slope, relief) and catchment properties (i.e., land cover, land use, precipitation) are the main driving forces behind changes in rating exponent b.Large b-values are linked to the varying availability of sediment due to changes in topography and catchment properties (A.B. Gray, 2018;Hoffmann et al., 2020;Syvitski et al., 2000).This becomes evident when comparing the rating curves of station KAL and station WEI.The rating curve for station KAL is less steep (low b-value, high a-value), compared to station WEI.This might be explained by the large forest cover of the contributing Ilz-catchment, which leads to a modest increase of SSC with Q, while the large b-exponent at station WEI is in line with the steep alpine topography of its contributing catchment.Asselman (2000) mentioned the combination of rating parameters a and b, whereas the steepness of the rating curve indicates higher sediment availability during flood events, as well as erodibility (soil's erosion susceptibility) and erosivity (the intensity of erosive forces).
We found a negative, linear relationship between Ts 80 % and b for each of the four stations (Figure 8).Synthetic SSC time series derived from the same discharge data but calculated with variable b show highly increased sediment transport during HD events for large b-values and thus shorter Ts 80 %-values than time series with small b exponents.Shorter sediment load exceedance times, in turn, increase the uncertainty of the annual load calculation for any given sampling interval.Therefore, there is a direct link between the b exponent of the sediment rating and the required sampling interval, with large b exponents requiring shorter sampling intervals to reduce the uncertainty of estimated annual suspended sediment loads below a certain error threshold (Moatar et al., 2020).The impact of rating coefficient a compared to exponent b is negligible.By this, we cannot attribute physical properties to the river or contributing catchment based on a alone (Asselman, 2000).
While the error term ε assesses the fit of the rating curve by examining the scatter of the datapoints from the NLS-rating curve, ε indicates whether further uncertainties are induced by the rating method or if the curve can be applied to the data set (Koch & Smillie, 1986).Highest ε-values are present at the station with the highest load flashiness (see station WEI, Table 1), leading to a strong increase in Ts 80 % utilizing the NLS-regression (Table 2).Here, the fit of the NLS-regression is strongly affected by heterogeneous sediment transport, with high sediment transport during comparably low discharges.This might be caused by the remobilization of accumulated eroded material when discharge and surface runoff start to exert a certain threshold (Vercruysse et al., 2020).By increasing the scatter around the optimal regression curve by ±20 and ±50% further difficulties appear.Calculating SSC based on random ε-values leads to negative concentrations and, thus, sediment loads (Asselman, 2000).Setting negative values to 0 results in an overestimation of annual loads with ε.At the same time, the overall precision is decreased since randomly added ε-values are normally distributed, reducing the overall range of SSC-values.
Rating methods for surrogate measurements strongly depend on manual sampling.A large coverage of calibration time series and surrogate time series is necessary (Cheviron et al., 2014).With comparable measurement equipment at the four stations, a combined 2,332 manual samples, more than 1,500,000 sensor measurements, and high-resolution discharge data, we expect our data set to be robust enough to test our approach to quantifying the variability of annual load estimates based on 15-min surrogate time series.

Conclusions
Uncertainties in annual load estimates based on temporal variability of sediment transport and discharge have been assessed in four (agricultural) catchments in Germany (river Rhine, river Moselle, river Ilz, and river Ammer) covering variable topography (small mountainous and large complex river systems) and discharge regimes (snow fed to pluvio-nival regimes).Our findings extend existing studies regarding the relationship between uncertainties in annual load estimates and (sediment load) exceedance times (as variability of sediment transport), incorporating high-resolution sensor data with highly accurate sediment load estimates.The four stations represent different hydrological properties (varying load/flow flashiness) and catchment characteristics.
Our main findings include: 1. Based on the variability of sediment transport (represented by Ts 80 %), a minimum sampling frequency can be derived to obtain reliable (accuracy <20% compared to reference load) annual load estimates.However, our approach allows users to adapt the expected uncertainty range if required, and the sampling interval can be adapted to either resolve (a) annual load estimates with high precision/accuracy or (b) estimate reliable long-term averages, depending on the aim of the user.However, we show that for catchments with high variability, high-resolution monitoring is necessary, and daily sampling does not resolve punctuated flood events.Thus, establishing sediment load exceedance time requires surrogate sampling techniques and frequent calibration samples.2. The sediment load exceedance time (Ts 80 %) is a reliable measure of variability.In the case of low-frequent SSC measurements, SSC interpolation with auxiliary daily discharge data reduces uncertainties in annual load estimates slightly.Here, sediment rating enables the calculation of reliable Ts 80 %-values, based on low-frequent SSC-samples.Daily discharge data is indispensable to utilize Ts 80 %, while (sub)daily SSC measurements are still necessary.For catchments <1,000 km 2 with high load flashiness, surrogate sampling and continuous maintenance are necessary, as punctuated flood events are not resolved by infrequent sampling.Based on high-frequency 15-min sensor measurements, we link uncertainties in annual load estimates to Ts 80 %, showing a negative exponential trend, which can be used to obtain the minimum required sampling intervals.3. Rating exponent b has a higher impact on annual load estimates than a or ε.We found a negative linear trend between rating parameter b and Ts 80 %.Thereby, Ts 80 % decreases with increasing b.In turn, catchments with a more rapidly changing regime and higher load flashiness tend to have higher b-values (steeper rating curves) than regimes with a higher response time to hydro-meteorological changes and less readily available sediment.Sediment rating parameters based on high-resolution data yield distinct differences compared to parameters derived from infrequent manual sampling.Thereby, the temporal variability of a river system might be over/underestimated if sediment rating parameters are derived from low-resolution data sets.
Our approach helps refine sampling schemes and the reliability of monitoring data.Our study covers only four monitoring stations with a limited global representation, but we argue that the presented approach can be applied to catchments on a continental to global scale to assess uncertainties in annual load estimates under the assumption that the river systems show power-law sediment rating relationships between SSC and Q.The data used in this paper is taken from the suspended sediment monitoring network of the German waterways, which was established by the Federal Waterways and Shipping Administration (Wasserstraßen-und Schifffahrtsverwaltung des Bundes, WSV) and the Bavarian Environment Agency (Landesamt für Umwelt, LfU).We acknowledge the WSV and the LfU for maintaining the monitoring network and for water sampling.Furthermore, we thank Stefan Talke and three anonymous reviewers for their helpful comments and suggestions that greatly improved the quality of this paper.This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
-Augusta-Anlagen Koblenz, COC = Cochem, KAL = Kalteneck (Hutthurm), WEI = Weilheim (i.OB), coordinates (N, E) in decimal degrees (WGS84), MQ = mean annual discharge, Q = discharge, SSC = suspended sediment concentration, Q s = sediment load, Ts 80 % = time necessary to transport 80% of sediment load, Tw 80 % = time necessary to transport 80 % of water volume, mean precipitation from nearest station between 1991 and 2020, n = total number of sensor measurements (missing values excluded), missing values = NAs in time series caused by sensor malfunction/maintenance. Calibration period represents time series of manual samples present at publication, manual sampling is continued.Calibration samples are manual samples with same timestamp as surrogate samples.RSE = Residual Standard Error, represents fit of NLS rating curve based on calibration samples.

Figure 2 .
Figure 2. (a) Variability of normalized sediment load (over entire time series) for single stations.Sampling intervals tested with bootstrapping (100 replicates).Uncertainty margins (10% and 20%) are marked by dark (0.9-1.1) and light (0.8-1.2) gray shaded areas.Dashed horizontal line (at 1.0) marks normalized reference sediment load (calculated as daily average from 15-min sampling interval).Horizontal line in boxplots shows median of normalized sediment load from 100 replicates.Uncertainties and underestimation increase with sampling interval.(b): Annual loads calculated with infrequent SSC values.Missing SSC interpolated based on mean daily discharge data with lowess-interpolation. Bootstrapping similar to panel (a).

Figure 3 .
Figure 3. Flow duration curve of normalized discharge and normalized sediment load as relationship to 80% exceedance time (80% as horizontal dashed line at 0.8) over entire monitoring period for each station.Vertical dashed lines indicate time necessary to transport 80% of discharge (Tw 80 %) and sediment load (Ts 80 %).Exceedance time in brackets translates to mean exceedance time per year, for example, at station WEI 80% of the annual sediment load is transported in 10 days.Steepness of curve indicates higher variability of (sediment) transport.Discharge exceedance time significantly larger and does not correlate with sediment exceedance time.

Figure 4 .
Figure 4. (a) Relationship between ∆Q s,n and Ts 80 % for stations KAA, COC, KAL, and WEI.Scatter plot composes 2 sampling intervals (1/week and 3/ month) for every year in monitoring period.Black curve represents calculation represent NLS-regression using negative exponential relationship between ∆Q s,n and Ts 80 %.(b) Relationship between ∆Q s,n and Tw 80 % for stations KAA, COC, KAL, and WEI.Scatter plot composes 2 sampling intervals (1/week and 3/month) for every year in monitoring period.(c) Relationship between ∆Q s,n and Ts 80 % for stations KAA, COC, KAL, and WEI similar to Figure (a).In contrast to (a), sediment loads in (c) calculated from infrequent SSC data sets with lowess interpolation based on daily discharge data.Scatter plot composes 2 sampling intervals (1/week and 3/month) for every year in monitoring period.Lines represent calculation of ∆Q s,n based on NLS-fit from Ts80%.Negative exponential relationship from 4A is maintained.

Figure 5 .
Figure 5. Function of uncertainty of sediment load (∆Q s,n ) and sediment exceedance time with NLS-regression for each interval over the entire data set with 4 stations.Intersect with 0.2 indicates which sampling interval is necessary to obtain annual load estimates with an error <20%.Dashed lines are from low-frequent SSC time series with lowess interpolation based on daily discharge data (as plotted in Figure4c).Solid lines are low-frequent "synthetic" time series from 15 min turbidity data (as plotted in Figure4a).

Figure 6 .
Figure 6.SSC ∼ Q scatter plot in log-log scale from calibration samples for COC.NLS-regression performed for optimized a and b values (red), as well as changing a-parameter and b-exponent with ±20 and ±50% respectively.If b is changed, a is set to a opt (a), if a is changed, b is set to b opt (b).

Figure 7 .
Figure 7. Uncertainties of normalized annual load estimations for 5 Sampling Intervals and 5 b-exponent-scenarios for station COC (monitoring period 2010-2019).b = 1.2 represents optimized fit for NLS regression.Reference is based on optimized NLS-regression.Increase of uncertainties and variability indicated with increasing b (1.44 and 1.8).Decrease of variability detected if b is decreased (0.96 and 0.6).

Figure 8 .
Figure 8. Relationship between Ts 80 % and rating exponent b for all stations.Length of boxes shows annual variability of Ts 80 % (each point represents Ts 80 % of a single year).Solid lines represent linear trends.

Figure 9 .
Figure 9. Change of b-exponent with different sampling/averaging schemes.Vertical dashed line represents b exponent from calibration data (based on all manual samples).0/30 sampling scheme refers to 0 samples at high discharge (HD) with 30 samples during random discharge (RD).5/25 sampling scheme utilizes 5 samples during HD and 25 samples during RD.Daily averages are calculated for the calibration period since for some days, multiple calibration samples are available.Additionally, weekly, and monthly averages of calibration samples.Random (RD) sampling schemes utilize 15-min data set with 100 repeats to calculate rating exponent b.
Note.Increase for Ts 80 % from sensor time series to NLS.Decrease in Ts 80 % with b.Increase in Ts 80 % with ε for COC, KAL, and WEI.

Table 2
Mean Ts 80 % for OptimalNLS and With Varying b-Exponents, Parameter a and ε Values (in %)