Uncertainties of GRACE‐Based Terrestrial Water Storage Anomalies for Arbitrary Averaging Regions

The application of terrestrial water storage (TWS) data observed with GRACE and GRACE‐FO often requires realistic uncertainties. For gridded TWS data, this requires the knowledge of the covariances, which can be derived from the formal, i.e., formally estimated in the parameter estimation, variance‐covariance matrix provided together with the Stokes coefficients. However, the propagation of monthly variance‐covariance matrices to TWS data is computationally expensive, so we apply a spatial covariance model for TWS data. The covariance model provides non‐homogeneous (location depending), non‐stationary (time depending), and anisotropic (orientation depending) covariances between any two given points. Further, the model accommodates wave‐like behavior of East‐West‐directed covariances, which residuals of GRACE striping errors can cause. The main application of such spatial covariances is the estimation of uncertainties for mean TWS time series for arbitrary regions such as river basins. Alternatively, regional uncertainties can be derived from the above mentioned formal variance‐covariance matrices of the Stokes coefficients. This study compares modeled basin uncertainties for GFZ RL06 and ITSG‐Grace2018 TWS data with the formal basin uncertainties from the ITSG‐Grace 2018 solution. The modeled and formal uncertainties fit both in the spatial and temporal domain. We further evaluate the modeled uncertainties by comparison to empirical uncertainties over arid regions. Here, again the appropriateness of the modeled uncertainties is shown. The results, namely the TWS uncertainties for global river basins, are available via the GravIS portal. Further, we provide a Python toolbox, which allows computing uncertainties and covariance matrices.

GRACE and GRACE-FO data complement other available remote sensing methods of the terrestrial water cycle by providing direct estimates of the total water storage from surface to groundwater independently of its exposure. By introducing those estimates into the water budget equation, GRACE has been very beneficial to evaluate net-water fluxes as given by global atmospheric re-analyses (Eicker et al., 2020) and to characterize large-scale flood events (Gouweleeuw et al., 2018). GRACE is also frequently being used to assess regional sea-level budgets to discriminate between temperature-and mass-driven sea-level rise (e.g., Rietbroek et al., 2016), or even high-frequency wind-driven signals (Schindelegger et al., 2021). However, all such water mass budget analyses require a rigorous uncertainty assessment of each of the data sets involved to allow for the identification of potentially remaining systematic errors. Also, for the assimilation of GRACE-based terrestrial water storage (TWS) data into numerical models (e.g., Schumacher et al., 2018) realistic estimates of the observation uncertainties are an essential prerequisite. This study aims to provide easy-to-use TWS uncertainties to a broad user group focusing on the hydrology community.
Sensor data from various instruments aboard the twin satellite missions of GRACE and GRACE-FO are needed to calculate a monthly gravity field. The two spacecrafts measure the range and range-rates between them with microwaves in the K-band. Accelerometers with three sensitive axes allow distinguishing non-conservative forces. Star cameras provide attitude information of each satellite, and GPS receivers as well as satellite laser ranging reflectors aid precise orbit determination. Data from all instruments are typically accumulated over 30 days and combined with time-variable background models for tides and non-tidal atmosphere and ocean variability to yield a single global gravity field in terms of spherical harmonic or Stokes coefficients up to degree and order 96.
However, the uncertainty of those gravity fields varies from month to month. Both GRACE and GRACE-FO operate in non-repeat orbits, which can occasionally degrade into a short-repeat orbit for some time (e.g., February 2015). After November 2016, accelerometer data became unavailable for GRACE-B due to energy limitations caused by failing battery cells. Thus, non-gravitational accelerations observed on GRACE-A are transplanted to the orbit of GRACE-B (Bandikova et al., 2019). A similar approach is also implemented for GRACE-FO due to an underperformance of the GRACE-D accelerometer. Further, the high-precision satellite-to-satellite tracking data are sampled along the near-polar orbit, which leads to a longer correlation length in the longitudinal direction. These uncertainty sources lead to non-homogeneous (spatially varying), non-stationary (temporally varying) and anisotropic (direction depending) covariances of the monthly-averaged GRACE and GRACE-FO data products.
Each gravity field provided as spherical harmonic coefficients is accompanied by a fully populated variance-covariance matrix of the gravity field parameters obtained from the least squares adjustment, which is proportional to the inverse normal equation matrix. The quality of those so-called formal uncertainties depends on (a) the quality of the linearized functional model relating gravity field and instrument-related calibration parameters to the observations; and (b) the stochastic model used during the inversion as the weight matrix of the different groups of observations. So far, the variance-covariance matrices of different analysis centers are not directly comparable.
In this work, we employ the variance-covariance matrices provided by ITSG-Grace2018 and ITSG-Grace_operational Mayer-Gürr et al., 2018). Both data sets employ an empirically estimated stochastic model of the input observations, which takes temporal correlations of the inter-satellite ranging and GPS data into account and provides a proper relative weighting between these two observation groups. The stochastic model is derived during the adjustment process using variance component estimation (Koch & Kusche, 2002). A thorough description of the approach can be found in Ellmer (2018). Neglecting temporal correlations of the observations or using ad-hoc weighting factors for the different observation groups typically does not affect solution performance but results in too optimistic formal uncertainties (Meyer et al., 2019). The formal uncertainties of the GRACE and GRACE-FO solutions from ITSG fit very well to empirical estimates , so that we anticipate that these variance-covariance matrices are a close approximation of the actual error structure. It should be noted that, while considerable effort has been put into the formal error derivation, they remain an approximation to the true error structure. In practice, simplifications in the observation noise model must be made either because of a lack of exact knowledge of instrument behavior and interactions or because of computational aspects. For example, post-fit residual analysis has shown that changes in environmental conditions and instrument configuration affect the noise characteristics for short periods (Bandikova et al., 2012;Goswami et al., 2018;Harvey et al., 2017).
The assumption of stationarity for the instrument noise covariance function within a month cannot capture these variations. Further,  have shown that treating background model deficiencies as stochastic error sources improves the formal error description. So far, only uncertainties in the non-tidal dealiasing model have been considered, while ocean tide models are treated as error-free.
From the variance-covariance matrix of the Stokes coefficients, the variance for any arbitrarily shaped averaging region can be derived. However, such a computation also requires rigorous variance propagation through the applied filtering method, which might not be feasible for every user who intends to apply GRACE and GRACE-FO data in hydrologic research. Please note that any change in the averaging region requires a re-calculation of the signal and its uncertainties. This re-calculation needs to be started again in the spherical harmonics domain as storing the full variance-covariance matrix in the spatial domain requires ∼32 GB per monthly solution, which is unfeasible for most users.
In order to make the error structure of GRACE and GRACE-FO readily accessible in the spatial domain, various covariance models have been developed in the past. For example, Landerer and Swenson (2012) proposed a spatial covariance model that is both homogeneous and stationary for use in JPL's Tellus portal. A more recent model proposed by  provides non-homogeneous, non-stationary, and anisotropic covariances but has been applied so far only to the simulated GRACE-FO terrestrial water storage (TWS) data of Flechtner et al. (2016). A covariance model allows any data user unfamiliar with the spherical harmonics representations or variance propagation to obtain uncertainty estimations for any particular averaging region's time series at minimal computational costs. For 100 global discharge basins and 92 climatically similar regions, the uncertainty time series together with the TWS time series can be downloaded directly from GFZ's GravIS portal (gravis.gfz-potsdam.de). In order to enable the user to use further the covariance model, we provide a Python toolbox that computes both the regional time series, uncertainty, and covariance matrix for any user-chosen region (Boergens, 2021; https://git.gfz-potsdam.de/boergens/regional-tws-uncertainty).
In this work, we will apply the covariance model of  on two different TWS time-series from GRACE and GRACE-FO data. In Section 2, we will provide an introduction to the GRACE and GRACE-FO data sets of this study, namely, GFZ RL06, and ITSG-Grace2018 and ITSG-Grace_Operational. Section 3 explains the covariance model in detail. Section 4 provides a detailed comparison of the results gained with the covariance model applied to both GFZ and ITSG TWS data and formal uncertainties of the ITSG data followed by an empirical assessment of the modeled uncertainties in specifically selected regions with a distinctly arid hydroclimate (Section 5). The manuscript closes with a summary and discussion on the practical implementation into the GravIS portal (Section 6).

GRACE and GRACE-FO Data
We use two different monthly Level-2 (i.e., spherical harmonic) GRACE/GRACE-FO data sets processed by European research groups, namely GFZ RL06  and ITSG-Grace2018 Mayer-Gürr et al., 2018). For the GRACE-FO mission period, ITSG-Grace_operational extends ITSG-Grace2018 based on the same processing chain. Both data sets are given up to degree and order 96 and cover the timeframe from April 2002 to September 2020.
For GFZ RL06, coefficients of low degree and order (C20, C21, S21, and C30 after November 2016) are replaced by estimates based on a combination of Satellite Laser Ranging with GRACE and GRACE-FO . Subtracted is the ICE-6G_D (VM5a, Peltier et al. (2018)) model of glacial isostatic adjustment effects on the gravity field, and degree-1 coefficients are estimated following an improved variant of Swenson et al. (2008). A potentially aliased tidal signal with a period of 161 days is removed and the Stokes coefficients are filtered with the time-variable anisotropic de-correlation filter VDK3 (Horvath et al., 2018). The smoothing of VDK3 is similar to smoothing with DDK3 filter and is in the global mean comparable to a Gaussian filter with 360 km half-width. The such processed Stokes coefficients are also publicly available as L2B data set  and are subsequently synthesized on a global 1°grid for this study . Please note that this gridded data set is not entirely identical with the Level-3 data available at the GravIS portal, where months with very poor orbit geometry are filtered with VDK2 instead of VDK3. This simplification is necessary for the ITSG data processing where the VDK filtering is not done operationally. It has negligible effects on the results of this study. Further, the official Level-3 data combines VDK3 and VDK5, where the latter is used for the deterministic, i.e., trend, annual, and semiannual signals and the former for the residual signals. We only use the residual signals in this work; thus, we refrained from processing the data set with VDK5.

10.1029/2021JB022081 4 of 13
The post-processing of the ITSG spherical harmonic data has been fully aligned to the processing choices made by GFZ, which in particular includes applying the VDK3 filter. A unique feature of the ITSG processing chain is the stochastic modeling of measurement noise and background model uncertainties. During the least squares adjustment process, a temporally stationary empirical noise model for all input observables is co-estimated with the gravity field parameters as given in Ellmer (2018, Chap. 5). Additionally, deficiencies in background models, which are rather stationary in space than in time, are also considered . As a result of this detailed stochastic modeling, the formal errors of the gravity field parameters represented by their variance-covariance matrix approximate the actual error structure very well. The proficiency of this approach has been shown in both comparisons with empirical uncertainty estimates (Horvath et al., 2018;Meyer et al., 2019) and full-scale simulation studies (Poropat et al., 2020).
In order to utilize the formal variance-covariance matrices to estimate TWS uncertainties, they have to be variance propagated through the post-processing steps described above, namely the filtering with the VDK filter. The filtering of the Stokes coefficients a nm (a nm = c nm if m ≥ 0, a nm = s n−m if m < 0), with their covariance matrix Σ{a nm }, can be expressed as the matrix multiplication with the filter matrix W α and the filtered coefficients . Thus, according to the law of variance propagation, the filtered variances and covariances are, The filtered Stokes coefficients, as well as their filtered variance-covariance matrices, are publicly available .
Following Swenson and Wahr (2002, Equation 17) one can obtain the averaged TWS time series over a given region directly from the Stokes coefficients: where w i is the cell size of the ith grid point at the spherical coordinates (r i , λ i , θ i ). M is Earth's total mass, R is the reference radius of the Stokes coefficients, ρ is the density of water, k n are load Love-numbers and Y nm are surface spherical harmonics. Note that this relation between the regional average and the Stokes coefficients can be expressed in matrix-vector notation as ( ) = ( ) . The matrix F can be precomputed for each region as it is time-independent.
In order to arrive at the uncertainties of such a TWS time series, variance propagation has to be applied. Due to the linear nature of F, this can be done straightforwardly:

Spatial Covariance Model
To describe the spatial covariances, we apply a mathematical model that is anisotropic, non-homogeneous, and non-stationary . By being non-stationary, the model might vary in time to accommodate different orbit characteristics or changes in instrument performance, but it does not describe correlations in time. Each monthly solution is still considered an independent snap-shot of the gravity field.
The covariance between the TWS values tws i and tws j at (λ i , θ i ) and (λ j , θ j ) consists of a time-independent structure Corr scaled by time-depending TWS standard deviations σ i/j (t): The covariance model's time-independent part, Corr, is further called correlation model, although it does not describe correlations in a strict sense, i.e., it is not bound by −1 to 1. Corr is latitude depending (and thereby non-homogeneous) and allows for different correlation lengths in longitudinal and latitudinal direction (and 10.1029/2021JB022081 5 of 13 is thus anisotropic). Supporting Information S1 describes in more detail the correlation model. The standard deviations σ i (t) and σ j (t) introduce the time-dependence (i.e., non-stationarity).
Both the parameters of the correlation model and the point standard deviations are empirically estimated from the GRACE and GRACE-FO gridded TWS data. The time-depending standard deviations are the spread of the residual TWS data (TWS red , trend, annual, and semiannual signals removed) over the open ocean weighted by the grid cell size w i calculated individually for each monthly solution: with a the number of open ocean grid points. Please note that for any given time step, all grid points will have the same "raw' standard deviation, i.e., σ i (t) = σ j (t) = σ(t). However, the standard deviation of a TWS value at such a grid point is a scaled version of σ(t) according to Equation 5. We consider the spread of the signal over the open ocean, i.e., a minimum distance of 1,000 km to the coast, as best representing the noise level of the GRACE and GRACE-FO data set (e.g., Meyer et al., 2019). Continental residual TWS still contains several interannual signals superimposed onto the month-to-month variation of the noise floor.
Empirical covariances are calculated from residual continental TWS data TWS red , following the work of Wahr et al. (2006), and used to estimate the parameters of the correlation model Corr. So far, the period of GRACE-FO is too short for deciding whether two separate sets of model parameters for GRACE and GRACE-FO should be estimated. The use of the empirical covariances provides an upper bound to the true uncertainties as we do not remove possible geophysical variability beyond a trend or annual and semiannual signals. As in high latitudes, only a few continental landmasses are present, which are predominantly covered by ice, we consider only empirical covariances up to a latitude of 70°Ṫhe correlation model is estimated individually for GFZ and ITSG. We found the parameters to be equivalent within the parameter uncertainties so that only a common set of parameters for Corr is considered in the following. Therefore, the covariance models of GFZ and ITSG only differ in their point standard deviations σ(t).
The covariance model is subsequently applied to calculate uncertainties of average TWS time series for an arbitrary region as a measure of variance: were a is the number of grid points in the region with w the area weight to each. In this study, we rather use standard deviation than variance, thus model( ) = √ model( ) . In  we found that the model estimated from empirical covariances results in globally overestimated uncertainties that were corrected a posteriori by a scaling factor. In this study, we adapt the parameter estimation to include this scaling factor such that globally the size of the estimated uncertainties of ITSG fit uncertainties derived from the formal ITSG covariances during the years 2005 and 2010. This period is considered the most stable phase of the GRACE mission with fully functioning satellites, low solar radiation, and stable orbit altitude. Table. 1 gives the resulting model parameters.

Comparison Between Modeled and Formal Uncertainties
As averaging regions, we use 156 polygons that primarily follow the boundaries of discharge basins but merge smaller basins not resolvable by satellite gravimetry with the larger neighboring regions. By this, the whole surface of the Earth"s continents apart from Greenland and Antarctica is covered. The covariance model described above is used to calculate the standard deviations of the average TWS time series for each basin, referred to as ( )  directly from the variance-covariance matrix of the ITSG Stokes coefficients by applying Gaussian variance propagation. In the following, we will label those as formal covariances and ( ) for the basin uncertainties.
Exemplarily, we focus on the temporal evolution of the basin standard deviations in the Yenisey and Orinoco basins (Figure 1), that deviate from each other in terms of latitude, hydroclimate, and size (2.6 Mio km 2 and 1 Mio km 2 , respectively). In both basins, standard deviations display a significant increase at the end-of-life phase of GRACE. Further, a high peak in December 2002 and an increase around 2012 are detectable through all time series. However, for the Yenisey basin, the modeled standard deviation ( ) and ( ) are larger than ( ) . In contrast, the modeled ITSG standard deviations in the Orinoco basin are lower than GFZ modeled and ITSG formal.
We observe a ∼161-day oscillation of the formal uncertainties in the GRACE-FO period. This oscillation has been observed in earlier GRACE releases (Cheng & Ries, 2017) and could have two causes. First, it can be caused by S2 tidal aliases; second, it can be related to the orientation of the orbital plane with respect to the sun (beta prime angle). So far, we empirically estimate and remove this oscillation jointly from both the GRACE and GRACE-FO spherical harmonics in the data processing. The residual signal we observe in the GRACE-FO period indicates either a phase or amplitude shift between the GRACE and GRACE-FO period. There is no reason to expect a phase shift between the two mission periods in the S2 tidal aliases, while we cannot necessarily expect the same phasing of the beta prime angle. Unfortunately, to date, the period of GRACE-FO is too short for two separate signal estimations. Additionally, any accelerometer error of GRACE-FO is indistinguishable from the effects of the beta prime angle. Thus, due to the problems with the GRACE-FO accelerometer, we have to expect a larger error amplitude for this observation phase.
Furthermore, we see a slight drop in the formal uncertainties of GRACE-FO, which is a result of a change in the ITSG processing between GRACE and GRACE-FO with an increased sampling of the incorporated GPS observations. The correlation between the time series of ( ) and ( ) , or ( ) and ( ) , respectively, is 0.76 and 0.86 for the Orinoco basin and 0.77 and 0.85 for the Yenisey basin, indicating that the covariance model well reflects the time variations in the uncertainties.
Temporal correlation can also be computed for the other 154 basins. For the time series of ( ) , correlation ranges from 0.70 to 0.80, while the time series of ( ) show a higher temporal correlation between 0.81 and 0.88. Spatial maps of the correlations do not reveal any systematic pattern and are therefore not shown. It is generally encouraging that no obvious systematics (i.e., trends, annual or interannual variations) are visible in the time series of basin standard deviations. We, therefore, compute the temporal mean of the basin standard deviations, now denoted with , , and , to further investigate their spatial pattern ( Figure 2).
As expected, the spatial pattern of the two modeled basin uncertainties are nearly identical but have a different scale. The generally higher noise level causes the different scale in GFZ RL06 compared to ITSG, as reported earlier from other independent assessments. Compared to the modeled uncertainties, the formal uncertainties show a more extensive spread and a higher variability, especially between basins at similar latitudes. Our covariance model does not accommodate such a feature which is only latitude dependent.
We found additional spatial features in both the formal and the modeled uncertainties. These features include higher uncertainties along the African and North American coasts and the stark contrast between basins at the Indian sub-continent with neighboring basins. Overall, we found to be approximately 1.4 times larger than and . Figure 3 highlights the ratio between the formal and modeled basin standard deviations. We do not show a ratio between and between 0.7 and 1.3 in order to focus on the more significant deviations. Overall, as expected from Figure 2, fits better to the order of magnitude of than .
For , only 7 out of 165 basins have a ratio below 0.7, and 10 basins have a ratio larger than 1.3. The larger of the two South Asian basins, Ganges and Brahmaputra, is probably caused by temporal aliasing due to the significant sub-monthly hydrologic variations associated with occasional flooding observable in these basins resulting in larger formal uncertainties. The two Siberian basins with the largest ratios (Pjassina/Taymyra and Chatanga) are among the three northernmost regions considered in this study with a mean basin latitude beyond 70°N for which the covariance model was never optimized (see Section 3). At these high latitudes, magnitude decreases quite strongly with latitude in the covariance model, which can also be seen in Figure 2.
In general, we observe that in extensive basins together with pronounced East-West elongation tends to be smaller than , namely Amazon (rank 1 in total basin size), Mississippi (rank 4), North West and North East Interior (Sahara, rank 5 and 14, respectively), Amur (rank 11), and Turgai (rank 26). This effect is primarily caused by differences in spatial correlations in the latitudinal direction between the modeled and the formal correlations. While our covariance model principally includes a wave-like signature in the latitudinal direction to accommodate residuals of the GRACE striping errors (see Supporting Information S1 for more details), this wave effect is even more pronounced in the formal covariance matrix. However, as will be shown in Section 5, 10.1029/2021JB022081 8 of 13 empirical estimations of the uncertainty for the two Sahara regions agree well with ( ) and ( ) . This indicates an underestimation of the uncertainty of at least in these two regions. We ultimately concluded that no further refinement to our covariance model is needed at this point.
On the other hand, smaller basins with a predominate North-South elongation exhibit larger than , namely Irrawaddy, Malaysia, Sulawesi, Southern Pacific Coast (South America), Tocantins, and Ogooue/Central West Coast (Africa). These findings indicate that the anisotropy of the covariances is more pronounced in the formal covariances than in the modeled ones. For the mismatch between and in the South Atlantic Coast (Africa) basin, however, we have no further explanation besides that the shape and position of the basin are not ideal for GRACE and GRACE-FO observations. We anticipate, however, that this is not caused primarily by leakage of the ocean signal as the formal covariances are not influenced by signal leakage, and neither is the covariance model.
Concerning the other GRACE release considered in this study, GFZ RL06, we recall that the basin standard deviations of are globally larger than . Thus, the ratios are also shifted toward smaller values. This leads to only one basin ratio larger than 1.3, namely the Pjassina/Taymyra region discussed above. On the other hand, 34 basins exhibit a ratio lower than 0.7. Again a number of those are narrow coastal basins similar to the South Atlantic Coast (Africa) basin, namely Coastal Iran, East Arabic Coast, Eastern and Western Mediterranean Coast (Africa), North West Coast (Africa), and South Kolyma. If we factor in the 1.4 times larger values of , the same larger basins with major East-West expansion as for show a small ratio.
We also calculate the correlation coefficient between the temporal meanfields of modeled and formal basin-level uncertainties. This correlation coefficient describes the similarity of the spatial variations of the two fields regardless of a scaling factor. For both and , the correlation coefficient is 0.63, which we rate as a reasonably high similarity.
Further, we computed correlations between the basin uncertainty maps of each time step (Figure 4). This metric has values around 0.7 for the first years of GRACE that decline toward the end of the mission lifetime but return to a level of around 0.7 again for GRACE-FO. This indicates a temporal change of the spatial pattern of the formal uncertainties that is not reproduced with the covariance model, which might require further investigation. Since a change of the spatial correlations have not been identified in the empirical correlation on which the estimation of the covariance model is based, we continue working with a single realization of our covariance model.
and our spatial covariance model applied to ITSG-Grace2018 ( ( ) ) as well as to GFZ RL06 ( ( ) ) for each monthly solution of the GRACE/ GRACE-FO data record.

Comparison to Empirical Uncertainty Estimations
We now test the uncertainty assessment approach presented above for somewhat artificially shaped variable size regions. We focus on arid regions of the world's continents, characterized by a meager amount of mean annual precipitation (less than ∼100 mm/year) according to a climatology provided by the Global Precipitation Climatology Centre (GPCC (Schneider et al., 2014, Figure 8). We choose four different polygons (see Figure 5 for the outline of the regions) in the Sahara desert (West and East), on the Arabian peninsula, and in the area of the Gobi desert in Central Asia. Following the assumption that in these regions the residual signal in the GRACE and GRACE-FO time series (after subtracting the mean, trend and (semi-)annual signals, see Figure 6) can be  interpreted as an empirical measure of the remaining errors, we compute the temporal standard deviation of the time series.
It has to be noted that these measures should be regarded as upper bounds of the expected errors, as, even in desert regions, residual gravity signals are still present in the data. Particularly in the Gobi region, some interannual variations can be observed as the region reaches the Himalayas. In order to characterize the temporally varying uncertainties of the gravity field solutions caused by, e. g, satellite hardware issues toward the end of the GRACE mission, the standard deviations are computed independently for different mission phases: early GRACE (before  Figure 6 show the resulting standard deviations for the different mission phases for ITSG (yellow) and GFZ (light blue). It can be observed that while the standard deviations are generally lower during the middle part of the GRACE mission, the uncertainties strongly increase toward the end of the mission due to the instrumental degradation. Furthermore, the empirical standard deviations are slightly lower for the ITSG time series than for GFZ except for the second mission phase in the Gobi desert, where the interannual signal appears to dominate the estimation of the standard deviations. Figure 7 shows a comparison of the empirically derived standard deviations with the formal and modeled uncertainties, where the light blue and yellow lines are identical with the respective lines in Figure 6. We conclude that the modeled standard deviations ( ) (red) and ( ) (blue) fit very well to their empirically derived counterparts. Especially for ITSG, the modeled standard deviations are very similar to the empirical ones. Further, the temporal evolution of the uncertainties over the different GRACE and GRACE-FO missions phases can be confirmed nicely. This is particularly true for the two Sahara regions and the Arabian Desert. In contrast, in the Gobi area, the empirical errors are most likely overestimated due to the remaining interannual signal in the residuals. The empirical uncertainty estimates for GFZ are somewhat more pessimistic but still generally confirm the magnitude of ( ) , which are also slightly larger than ( ) . Only for the GRACE end-of-mission phase, the GFZ empirical standard deviations are quite large. However, in this short period, outliers in individual monthly solutions quite strongly dominate the empirical estimates, which should, therefore, be handled with caution. Based on these exemplary results, we conclude that the covariance model derived in Section 3 provides realistic estimates of the standard deviations for basin averages of arbitrarily chosen (desert) regions.

Summary and Practical Implementation
For monthly-mean gravity fields from GFZ RL06 and ITSG-Grace2018, we employ a spatial covariance model to calculate time-variable uncertainty estimates for arbitrarily shaped averaging regions. The resulting uncertainties were tested against estimates rigorously derived from the formal variance-covariance matrices of the ITSG-Grace2018 spherical harmonic coefficients. For 156 globally distributed basins, we find that the GFZ modeled uncertainties are larger than those for ITSG, which is related to the generally higher noise level of GFZ RL06. Temporal variations of both modeled basin uncertainties are well aligned to the temporal variations of the formal uncertainties. The spatial pattern of the basin uncertainties also fits quite well, but especially in large basins with pronounced east-west elongation, the modeled basin uncertainties exceed the formal ones.
We further assess the quality of the uncertainty estimations by comparison to empirical standard deviations in four large polygons in the Sahara, Arabian, and Gobi Deserts. Results indicate the appropriateness of the modeled covariances, even though the analysis underlines that geophysical signals at interannual time scales are sometimes present in GRACE/GRACE-FO data even in distinctly arid regions. We, therefore, conclude that our covariance model is a suitable surrogate for spatial covariances of GRACE and GRACE-FO gridded TWS data for users unwilling to perform a rigorous variance propagation starting from the variance-covariance matrix of the Stokes coefficients.
To further motivate the application of our covariance model, we assess the computational burden required to evaluate the covariance model compared to the full variance propagation of the formal uncertainties: The computational time of the covariance model increased in a quadratic fashion with the size of the considered region. For example, the calculation of the covariances for the region West Sahara takes about 200 s for 240 grid points. This region is already rather large compared to most of the basins considered in this study. It should also be kept in mind that the correlations only have to be computed once and are subsequently scaled by the global grid point standard deviation, which speeds up the processing. For recurring computations, correlations are stored on file for each latitude. On the other hand, the formal covariances are directly inferred from the filtered spherical covariance matrix, allowing rapid computation of the basin uncertainties. For all 156 basins, this calculation takes a few minutes on a single CPU. Computing the whole global covariance matrix and inferring the basin standard deviations thereof would take significantly longer (range of few hours) and would require large storage capacities of 32 GB per month.
The covariance model in its current state has been estimated for TWS data that is not corrected for any leakage, although a leakage correction is available for GFZ's TWS data set. It is generally advised to apply this leakage correction to the TWS data for investigations on regional mean TWS time series. Unfortunately, the leakage correction can only approximate the unknown leakage effect, hindering uncertainty estimation thereof. However, it is reasonable to assume that the overall uncertainty of a mean TWS time series with leakage correction is smaller than for the uncorrected time series. Thus, the provided uncertainty can be understood as a conservative upper bound of the overall uncertainty.
The covariance model given in Table 1 has been recently implemented in GFZ's Gravity Information System (GravIS, gravis. gfz-potsdam.de) for GFZ RL06, which serves as an access point to GRACE and GRACE-FO data for non-geodetic users. Uncertainties based on this model are readily available for the 100 largest discharge basins of the world. Please note that we showed the results for 156 river basins that segment the whole continental land surface in this study. Additionally, signals and associated uncertainties are also available for 92 regional clusters that resemble common characteristics of the prevailing hydroclimate. The GravIS portal additionally provides access to RL01 gravity fields of the International Combination Service for Time-variable Gravity Fields (COST-G). This combined solution is based on several international Level-2 GRACE and GRACE-FO solutions, including GFZ RL06 and ITSG-Grace2018 (Jäggi et al., 2020). The latter series strongly influences COST-G for the GRACE period by having the highest weight in the combination process; we, therefore, assume that the results presented in this study are also transferable to COST-G.
For users interested in the uncertainties of regions not readily available via GravIS, we provide the source code of a Python toolbox (Boergens, 2021; https://git.gfz-potsdam.de/boergens/regional-tws-uncertainty), that offers access to the full covariance information for all points on a regular spatial grid. The Horizon2020 project "Global Gravity-based Groundwater Product" (G3P, www.g3p.eu) will employ the covariance model in this way. This project combines GRACE and GRACE-FO data with satellite-based observations of different hydrological compartments such as soil moisture or surface water to understand groundwater storage variations. To allow for a rigorous combination process of the different remote sensing data sets, reliable covariance estimates of the input data is critically important.