Most Global Gauging Stations Present Biased Estimations of Total Catchment Discharge

Stream gauging stations provide critical streamflow measurements for hydrological applications; however, they may not accurately capture total catchment discharge due to unmonitored regional groundwater flow. Here, we evaluate the effectiveness of streamflow data from gauging stations worldwide to represent total catchment discharge through a modified hydrological model that includes baseflow signatures to constrain groundwater flow processes. We find that approximately 70% of gauging stations present biased estimations of total catchment discharge (bias >10%). This result implies that hydrology‐related processes may not be fully understood, and misleading conclusions may be drawn owing to the low streamflow measurement effectiveness. By influencing subsurface hydrological processes, catchment factors, including catchment area, topography, climate, and geological features, are linked to the effectiveness of streamflow measurements. Our findings highlight the importance of accurate streamflow measurement effectiveness for obtaining a reliable understanding of catchment hydrological processes to support sustainable water resource management.

The hydrology-unclosed feature of catchments appears to be more pronounced than expected (Fan, 2019;Kampf et al., 2020). Evidence of such hydrology-unclosed catchments has come primarily from chemical tracer experiments (Alvarez-Campos et al., 2022), continuous monitoring of groundwater discharge and streamflow (Jasechko et al., 2021;Käser & Hunkeler, 2016), physical-based hydrological modeling (Pellicer-Martínez & Martínez-Paz, 2014;Zanon et al., 2014), and water budget closing (Schwamback et al., 2022). Intercatchment groundwater flow is not directly observable; however, it plays a nonnegligible role in streamflow generation in many catchments (Frisbee et al., 2016;Schaller & Fan, 2009), especially in karst areas . Therefore, the monitored streamflow may contain imported groundwater flow from neighboring catchments or merely part of the total discharge of the catchment. Additionally, widespread groundwater-surface water exchanges can impair the effectiveness of streamflow observations to represent the total catchment discharge (Condon et al., 2020). These studies have helped to understand the mechanisms of streamflow generation but have not estimated the gauging stations' monitoring capacity (i.e., the range of discharges that a gauging station can constrain) across different environmental conditions. However, estimating the gauging stations' monitoring capacity has been limited by the lack of monitoring networks for hydrological components, such as evapotranspiration (ET) and groundwater (Kampf et al., 2020;Koppa et al., 2021). Thus, estimating gauging stations' monitoring capacity at a global scale generally relies on hydrological models to simulate major catchment hydrological processes. Accurately estimating a gauging station's monitoring capacity with associated uncertainty in the model's precipitation input has also proven to be highly challenging (Safeeq et al., 2021;Sun et al., 2018).
In this study, baseflow signatures reflecting groundwater flow were separated from daily streamflow records observed at 1,761 gauging stations around the globe, and then coupled into a modified hydrological model incorporating intercatchment groundwater exchange and groundwater-surface water interaction processes. Additionally, we evaluated and merged four global precipitation products from different data sources to constrain the uncertainty in the model's precipitation input. Subsequently, we defined the hydrological closure index (HCI) as the ratio of monitored streamflow to total catchment discharge to quantitatively evaluate a gauging station's capacity to monitor total catchment discharge. The proposed HCI, based on the reliable model output for 1,038 catchments, was validated using the independent Budyko framework. Finally, we (a) mapped the spatial distribution of the HCI globally, (b) identified the key catchment factors and hydrological processes influencing the HCI, and (c) explored the possible impacts of biased estimations of total catchment discharge in typical hydrological applications.

Data Sets
Streamflow records from 10,702 gauging stations and associated catchment boundary data were obtained from the Global Runoff Data Centre. We used four criteria (Text S1 in Supporting Information S1) to select high-quality streamflow and catchment characteristic data, retaining 1,761 catchments worldwide to run the hydrological model.
Precipitation is a crucial component in hydrological simulations, but accurately estimating it at a global scale remains challenging, particularly for individual precipitation products (Koster et al., 2021). To address this issue, a triple collocation (TC) method was employed to merge multiple precipitation datasets (Lyu et al., 2021), including gauge-based, satellite-based, and reanalysis datasets. Using 30 catchments with relatively sufficient precipitation observations, we found that the merged precipitation based on the TC method notably improved the reliability of precipitation estimation (Text S2 in Supporting Information S1).
Other catchment data used in this study can be found in Text S3 in Supporting Information S1. Note that these catchment data were averaged for each catchment, except for latitude, which represents the absolute value of the catchment center (LAT).

Mabcd Model
The abcd model, a lumped conceptual model, including some key hydrological processes such as soil moisture balance, groundwater recharge and discharge at monthly time steps, was selected to simulate catchment hydrological processes (Thomas, 1981). The original abcd model is weak in simulating subsurface hydrological processes due to the lack of intercatchment groundwater exchange and groundwater-surface water interaction processes. Thus, we developed a modified abcd model (Mabcd) by including (a) two groundwater storages with different recharge and discharge rates (Stoelzle et al., 2015); (b) a process controlling groundwater import from outside the catchment; and (c) a process that determines the proportion of groundwater discharge entering the stream channel. Moreover, temperature-based snow and melt processes were included (Martinez & Gupta, 2010). Based on the Mabcd, we could simulate the streamflow that the gauging station monitored in the channel and the total catchment discharge including monitored surface and subsurface discharges and unmonitored groundwater flow. More details about the Mabcd are given in Text S4 in Supporting Information S1.
To calibrate the Mabcd in both surface and subsurface hydrological processes, we separated direct streamflow and baseflow from the observed daily streamflow. Subsequently, the simulated monthly direct streamflow and baseflow were calibrated using this observed direct streamflow and baseflow (Text S5 in Supporting Information S1). Moreover, the Sobol global sensitivity analysis method (Sobol', 2001) was used to analyze the sensitivity of streamflow simulations to the parameters in the Mabcd (Text S6 in Supporting Information S1).

HCI Estimation
The hydrological closure index (HCI) was proposed to quantify the capacity of the gauging station to monitor total catchment discharge. It is defined as the ratio of monitored streamflow (R) to total catchment discharge (TR): where TR, which is estimated at the average annual time step from the Mabcd model or Budyko framework, represents the sum of monitored surface and subsurface discharges and unmonitored groundwater flow; R represents the streamflow that the gauging station can monitor in the channel. An HCI value less than 1 indicates that only part of the total catchment discharge is monitored by the gauging station, reflecting an underestimation of the total catchment discharge, while a value greater than 1 indicates an overestimation.

HCI Validation
The estimated HCI based on the Mabcd was validated using the independent Budyko framework. The framework with a single parameter can effectively reflect the relationship between the long-term water and energy balance and calculate the total catchment discharge (Fu, 1981): where P and PET are the precipitation and potential evapotranspiration at an average annual time step, respectively; ω is the Budyko parameter that reflects the hydrological effects of catchment factors, in turn, is shaped by these factors (Li et al., 2013;Xu et al., 2013). We can obtain the independent HCI by combining Equations 1 and 2.
However, accurately estimating ω in Equation 2 poses challenges. The values for actual total catchment discharge and actual ET remain unknown. In hydrology-closed catchments, the measured streamflow closely approximates the total catchment discharge, facilitating the accurate estimation of ω based on Equation 2. Furthermore, in hydrology-unclosed catchments, ω can be estimated using models established in hydrology-closed catchments. Therefore, we used two distinct validation methods, each with different criteria, to identify hydrology-closed catchments and establish prediction models for ω. The specific details of these validation approaches are described below, and their differences can be found in Table S1 in Supporting Information S1.
Large catchments are considered to be approximately hydrology-closed (Validation 1). To estimate ω, we utilized two prediction models that were previously established in these large catchments through other studies, including a linear regression model (LR) (Li et al., 2013) and a neural network model (NN1) (Xu et al., 2013). The linear regression model was fitted using vegetation coverage in 26 major global catchments (>300,000 km 2 ). The neural network model was trained using three catchment characteristics, including slope, normalized difference vegetation index (NDVI), and LAT, in a combined catchment data set including 224 small (100-10,000 km 2 ) and 32 large catchments (>230,000 km 2 ).
Hydrology-closed catchments can be identified by the HCI derived from the Mabcd (Validation 2). We grouped 44 catchment datasets based on the proportion of hydrology-closed catchments, ranging from 0% to 100% (Text S7 in Supporting Information S1), and established 44 linear regression models for ω with three catchment characteristics (slope, NDVI, and LAT). We also established 44 linear regression models by modifying the monitored streamflow using the HCI and Equation 1 to obtain the total catchment discharge in the ω estimation process.
Additionally, we trained a neural network model (NN2) using the modified streamflow in almost half of the catchments to improve the ability to estimate ω, with predictors including catchment slope, NDVI, and LAT, showing high consistency with the HCI from the Mabcd (R 2 = 0.83, Figure 1e). More details are given in Text S8 in Supporting Information S1.

Estimation and Validation of the HCI
We used the Mabcd to simulate major catchment hydrological processes and estimate the effectiveness of streamflow observations to represent the total catchment discharge (see Methods). Simulations of catchment streamflow were performed on 1,761 catchments globally, yielding median values of monthly streamflow Nash-Sutcliffe efficiency (NSE) of 0.78 in the calibration and 0.73 in the validation (Figure 1a). Calibrated parameters are presented in Figure S1 in Supporting Information S1. The relatively poor performance over the high plains in the USA was consistent with existing studies (Mai et al., 2022). In the following analyses, we excluded catchments with poor model performance (i.e., NSE values less than 0.6 in either calibration or validation), retaining 1,038 reliable catchments with a median absolute percent bias (PBIAS) of 2.43%. A comparison between the simulated and observed streamflow in typical areas is presented in Figure S2 in Supporting Information S1. The model performance in simulating catchment hydrological processes was evaluated by comparing the average annual ET derived from the Mabcd (Mabcd-ET) and that from MODIS (MODIS-ET). Overall, Mabcd-ET showed good consistency with MODIS-ET (R 2 = 0.76). Nevertheless, there was a notable bias in high-temperature catchments, where MODIS tended to present a biased estimation of ET ( Figure S3 in Supporting Information S1).
We validated the HCI from the Mabcd using the Budyko framework with two methods to identify hydrology-closed catchments and estimate ω. In Validation 1, ω can be accurately estimated in large catchments, where monitored streamflow is approximately equal to the actual total catchment discharge. Consequently, a prediction model for parameter ω established in large catchments could estimate ω and the HCI in catchments with different sizes. The HCI from the Mabcd was compared with that from the Budyko framework with parameter ω fitted with catchment factors through a linear regression model (LR) established in large catchments (Li et al., 2013) and a neural network model (NN1) established in both large and small catchments (Xu et al., 2013) (see Methods). The nonlinear relationship between the HCIs from the Mabcd and Budyko-LR was strong (R 2 = 0.62), with a notable bias in steep catchments (Figure 1c). The bias between the HCIs from the Mabcd and Budyko-NN1 was more restrained, while the R 2 was weaker (Figure 1d). This result implies that, compared to the NN1 model, the simple LR model with only one predictor (vegetation) cannot accurately estimate parameter ω in steep catchments. Additionally, the catchments used to train the NN1 model may carry some uncertainty due to their hydrology-unclosed features, yielding a weaker R 2 . Moreover, our results indicate that monitored streamflow in large catchments deviates from actual total catchment discharge (see the following section), partially invalidating Validation 1. Overall, Validation 1 cannot adequately evaluate our results owing to the uncertainty in determining the hydrology-closed catchments and parameter ω, despite the close relationship between the HCIs from the Mabcd and Budyko.
In Validation 2, the HCI from the Mabcd can determine whether a catchment is hydrology-closed. Thus, using catchment datasets across different proportions of hydrology-closed catchments determined by the HCI from the Mabcd, we establish multiple prediction models for Budyko parameter ω and further estimate the HCI (see Methods). The results showed that the relationship (R 2 ) between the HCIs from the Mabcd and Budyko significantly depended on the proportion of hydrology-closed catchments in the catchment data set used to fit parameter ω (R 2 = 0.97, p < 0.001), with inferior relationships in low proportions of hydrology-closed catchments (Figure 1b). This result implies that parameter ω is misestimated in the hydrology-unclosed catchments, demonstrating the effectiveness of the HCI derived from the Mabcd. Moreover, modifying streamflow data by our HCI to obtain the total catchment discharge improved such relationships in low proportions, with a decreased slope (k) from 0.34 to −0.01 (Figure 1b), suggesting that the misestimation of parameter ω was caused by the hydrology-unclosed feature of catchments, thus supporting the above conclusion. These results confirm Validation 2, that is, that the HCI from the Mabcd can determine whether a catchment is hydrology-closed.
In summary, based on the NSE values and these validation processes, the Mabcd can effectively capture the major catchment hydrological processes and simulate the total catchment discharge worldwide, and the proposed HCI based on the Mabcd is an effective index for quantifying a gauging station's capacity to monitor the total catchment discharge.

Global Distribution of the HCI
Finally, we generated a global map of gauging stations' monitoring capacity for total catchment discharge, quantified as the HCI, that is, the ratio between monitored streamflow and the actual total catchment discharge ( Figure 2d). A value of HCI approaching 1 means that the gauging station can accurately monitor the total catchment discharge, indicating a hydrology-closed catchment, while a value deviating from 1 means that the total catchment discharge is overestimated or underestimated, that is, a hydrology-unclosed catchment. Consequently, 31.7% of gauging stations could accurately monitor the total catchment discharge (0.9 < HCI < 1.1), with a monitored bias of less than 10%. In contrast, 25.63% and 17.44% of gauging stations had moderate overestimations (1.1 < HCI < 1.3) and underestimations (0.7 < HCI < 0.9), respectively, with biases ranging from 10% to 30%; however, 9.15% and 16.09% exhibited strong overestimations (HCI > 1.3) and underestimations (HCI < 0.7), respectively, with biases greater than 30% (Figure 2d).
Globally, the 25th, 50th, and 75th quantiles of the HCI were 0.82, 0.99, and 1.16, respectively. In North America, Australia, Europe, and Asia, the HCI had a suitable median (0.97-1.05), but the HCI was notably lower in South America and Africa, with medians of 0.6 and 0.34, respectively ( Figure S4 in Supporting Information S1). Regional differences in the HCI were also observed. For example, the HCI in the northern Appalachian Mountains was generally higher than that in the southern Appalachian Mountains (Figure 2a), while the HCI in the southern Brazilian Plateau was higher than that in the northern Brazilian Plateau (Figure 2b). However, this regional difference was weak in most other regions, displaying strong heterogeneity, such as in the Alps and its northern plains (Figure 2c).
We found that the spatial distribution of the HCI partly depended on the catchment area, elevation and climate. The 1,038 selected catchments were divided equally into six groups based on the order of these catchment factors (Figures 2e-2g). The value of the HCI decreased with increasing catchment area, showing a general overestimation  Figure S5 in Supporting Information S1). The bottom values represent the range in each group that is equally divided in order. "UG" and "G" in f represent the unglaciated catchment and glaciated catchment, respectively.
in small catchments (<500 km 2 ), an underestimation in large catchments (>35,000 km 2 ), and a relatively accurate estimation in moderate sized catchments (Figure 2e and Figure S5a in Supporting Information S1). An overall underestimation was observed in upland catchments, despite the widespread presence of some glacial catchments that may lead to an overestimation (Figure 2f and Figure S5b in Supporting Information S1). The HCI was also influenced by climate, which was represented by the aridity index, with higher values in wet catchments and lower values in dry catchments (Figure 2g and Figure S5c in Supporting Information S1).
The regional difference in the HCI may be regulated by geologic factors, such as subsurface permeability, faults, and dipping strata, by influencing surface and subsurface channel networks (Hergarten et al., 2016;Lovill et al., 2018). The general overestimation in small catchments may be attributed to the high proportion of imported groundwater flow from the outside catchment in monitored streamflow, which is typically expressed in spring flow (Abrams et al., 2009;Frisbee et al., 2016) and preferential flow (Angermann et al., 2017). In contrast, the widespread potential loss of streamflow into underlying aquifers (Jasechko et al., 2021) can lead to underestimation in large catchments, challenging our understanding of the hydrology-closed catchment at a large scale. Moreover, deep aquifers in upland and dry catchments partially drive the underestimation in these catchments (Fan, 2019). It is worth noting that glacial meltwater is considered an additional water source in this study and may be an important water component in streamflow (Andermann et al., 2012), causing a high HCI in some glaciated catchments with high elevations (Figure 2f). In fact, the gauging stations at high elevations tended to underestimate the total catchment discharge after excluding the glaciated catchments with HCI values greater 1 ( Figure S6 in Supporting Information S1).

Key Hydrological Processes That Control the HCI
The effects of different hydrological processes reflected by model parameters on the HCI can be revealed based on three key parameters ( Figure 3). First, the parameter c, controlling direct streamflow and groundwater recharge processes, was negatively correlated with the HCI (R 2 = 0.65, p < 0.001), which suggests that the gauging station's monitoring capacity was partly influenced by the groundwater recharge process (Figure 3b). For example, the gauging stations in catchments with high groundwater recharger ratios tended to underestimate the total catchment discharge. Second, the parameter K g , controlling groundwater flow import from the outside catchment, was positively correlated with the HCI (R 2 = 0.4, p < 0.001, Figure 3c). Third, parameter K b , which controls groundwater flow into the stream channel, was positively correlated with the HCI (R 2 = 0.37, p < 0.001, Figure 3d). Overall, the three hydrological processes explain the spatial differences in the gauging stations' monitoring capacity.
Catchment factors influenced these three hydrological processes and further regulated the HCI. For example, spatial scale controls of hydrological processes and the HCI were clearly observed. In small catchments, the capacity of gauging stations to monitor the total catchment discharge was highly sensitive to the imported groundwater flow, which led to a high HCI. However, in large catchments, groundwater discharge that does not enter the stream channel, such as submarine groundwater discharge (Moore et al., 2008), contributed to a low HCI. Additionally, local climate, land cover, and topography play key roles in controlling the HCI by influencing groundwater recharge and stream-groundwater connectivity (Ivkovic, 2009). However, the unmonitored deep groundwater discharge is less controlled by these land surface characteristics, but limited by the permeability of bedrock (Iwasaki et al., 2021).

Limitations and Uncertainties
We identified important hydrological processes for streamflow simulations characterized by the total sensitivity index (TSI) of parameters in the Mabcd. Globally, the four most important hydrological processes for streamflow simulations in the Mabcd were evapotranspiration (parameter a), groundwater discharge (parameter d), groundwater import (parameter K g ), and groundwater recharge (parameter c), with median TSI values of 0.28, 0.27, 0.22, and 0.1, respectively ( Figure S7a, S7c, S7e, S7g, and S7l in Supporting Information S1). The model parameters governing important hydrological processes for streamflow simulations, that is, evapotranspiration and groundwater discharge processes, did not exhibit a good relationship with the HCI (Figure S8 in Supporting Information S1). It is crucial to note that this outcome does not imply the absence of influence from these 10.1029/2023GL104253 8 of 11 two hydrological processes on the HCI. For example, the lower HCI observed in arid catchments can be partly attributed to evapotranspiration (see Section 3.2). One key aspect contributing to this observation is the potential offsetting effect of the two hydrological processes over long timescales, in addition to the inadequate representation of their influence by the model parameters. While the Mabcd model has demonstrated its effectiveness in assessing whether a catchment is hydrology-closed, its lumped nature may lead to a certain level of inaccuracy in capturing the true hydrological dynamics. Additionally, our limited understanding of subsurface flow patterns restricts our capacity to obtain a comprehensive understanding of the processes that impact the HCI.
Although the Mabcd model has been shown to be reliable in estimating the total catchment discharge and HCI, supported by the use of MODIS-ET and the Budyko framework, there are still uncertainties due to the absence of direct validation using actual total catchment discharge data.

Implications for Hydrology-Related Applications
Gauging stations are key tools for monitoring channel streamflow and are critical in hydrological applications. However, a gauging station may not sufficiently capture the total catchment discharge due to intercatchment groundwater flow. This ambiguity can introduce biases in the simulation of hydrological processes and in the estimation of catchment hydrological responses to changing environmental factors such as climate and vegetation. Our results suggest that only 31.7% of the gauging stations worldwide accurately monitor the actual total Relationships between the HCI and three key parameters, including parameter c (b) controlling the groundwater recharge process, parameter K g (c) controlling the groundwater import process, and K b (d) controlling the groundwater discharge entering the stream channel or flowing out of catchments through non-channel areas.
catchment discharge (bias <10%). This result implies that most gauging stations overestimate or underes timate the actual total catchment discharge to varying degrees; thus, the reliability of hydrology-related studies performed in hydrology-unclosed catchments, which are unexpectedly widespread, is highly likely to be weakened by the biased estimations of total catchment discharge.
The direct impact of the bias in gauging station observations of the total catchment discharge can be quantified. For example, the Budyko position (BP) in 40% of gauging station observations changed significantly (BP > 0.1) due to the bias, with some observations falling beyond the Budyko boundary ( Figure S9a in Supporting Information S1); additionally, model performance was notably reduced in catchments with a high bias, where the estimation error of evapotranspiration was also high ( Figures S9b-S9d in Supporting Information S1). However, the indirect impact of the bias on hydrological applications, such as evaluations of the hydrological effects of climate change, human activities and vegetation change, is currently difficult to quantify. The different responses of subsurface and surface flow to environmental changes (Hare et al., 2021;Sulis et al., 2012) account for the indirect impact. This implies that the catchment hydrological response to the changing environment in hydrology-unclosed catchments, observed by the gauging stations, has been overestimated or underestimated to varying degrees. Given this, hydrologists may not fully understand the trends in the changing hydrological cycle and may even yield misleading conclusions. Therefore, it is necessary to highlight the importance of accurate streamflow measurement of total catchment discharge for obtaining a reliable understanding of catchment hydrological processes to support sustainable water resource management.

Conclusions
Overall, we generated a global map of gauging stations' monitoring capacity for the total catchment discharge and found that approximately 70% of gauging stations presented biased estimations of total catchment discharge (bias >10%). This result implies that hydrology-related processes may not be fully understood, and misleading conclusions could be drawn owing to the low effectiveness of streamflow observations to represent total catchment discharge in most catchments worldwide. Our findings also suggest that a gauging station's monitoring capacity strongly depends on catchment hydrological processes such as groundwater recharge, groundwater import from outside the catchment, and groundwater flow into the stream, and these processes are also influenced by catchment factors, including catchment area, topography, climate, and geological features.