Improving the Applicability of Lumped Hydrological Models by Integrating the Generalized Complementary Relationship

Lumped hydrological models (LHMs) are indispensable for water resource planning and environmental studies due to simple structures and robust performances. LHMs commonly focus on runoff processes with crude representations for other hydrological processes, such as evapotranspiration (E). Therefore, these models may yield unrealistic water balance partitioning. The challenge is to enhance the LHMs performance while retaining simplicity. The generalized complementary relationship (GCR) is a simple and robust method for estimating E. This study attempted to incorporate GCR into four widely used LHMs (Australian water balance model, GR2M, SIMHYD, and TANK) to test whether the integrated models (GCR‐LHMs) can improve runoff simulation at little cost to the model structure and data requirement. Original LHMs and integrated GCR‐LHMs were tested in 2112 catchments over various climatic conditions. Results show that the GCR‐LHMs outperform original LHMs in most catchments (77.7 ± 5.0%). In addition, the number of catchments that GCR‐LHMs have qualified performance (i.e., Kling‐Gupta coefficient [KGE] more than 0.5) increased by 10.7 ± 3.6% compared with LHMs. The performance of original and integrated models is dependent on the aridity index and normalized vegetation index. However, the improvement in model performance is less catchment characteristics dependent. These results indicate that incorporating GCR into the LHMs improves the model performance under different climatic and vegetation conditions and justifies the integration. GCR integration with LHMs can improve runoff estimation ability (with higher KGE and R‐Square) while retaining model simplicity and readily available input. These findings are valuable for improving the applicability and accuracy of LHMs.

• Four widely used lumped hydrological models are integrated with generalized complementary relationship • Integration improves model applicability while retaining the simplicity of model structure and readily available input • Improved model performance is largely independent of catchment characteristics indicating the general validity of the integration Supporting Information: Supporting Information may be found in the online version of this article.
importance of consistent process representation in LHMs.To this end, it is desirable to incorporate a conceptual model for estimating E.
The generalized complementary relationship (GCR) based on the complementary assumption of Bouchet (1963), is a simple conceptual model for estimating regional E by using routinely observed meteorological variables.Basically, there are four typical GCR functions based on different derivations of the complementarity principle and boundary conditions, that is, polynomial (Brutsaert, 2015; denoted as B15), sigmoid (Han et al., 2012;Han & Tian, 2018; denoted as H18), exponential (Gao & Xu, 2021; denoted as G21), and power functions with no-parameter (denoted as S0), one-parameter (denoted as S1) and two-parameter (denoted as S2) (Szilagyi & Crago, 2023;Szilagyi et al., 2022).These functions have no or only one or two adjustable parameters and have been applied to estimate E reliably across various time scales (Brutsaert et al., 2020;Li et al., 2020;Liu et al., 2016;Zhang et al., 2017).Thus, integrating GCR into LHMs (GCR-LHMs) can improve E estimation without compromising the parsimony and robustness of the LHMs (Lei et al., 2023).Such attempts have rarely been evaluated across different climate conditions and LHMs.
This study selected four widely used LHMs (AWBM, GR2M, SIMHYD, and TANK) to integrate with GCR and to evaluate the effectiveness and applicability of GCR-LHMs in different climate conditions.Long-term and high-quality hydrometeorological data from 2112 catchments worldwide with various climatic conditions were collected to test the model performance.All model performances are assessed at the monthly time scale due to the absence of confluence submodule in runoff simulation.The primary objectives of this study are: (a) to integrate GCR and LHMs (i.e., GCR-LHMs); (b) to diagnose the performance and applicability of integrated GCR-LHMs across different hydrometeorological conditions; and (c) to investigate the dependence of GCR-LHMs performance on catchment characteristics.

Lumped Hydrological Models
The AWBM, GR2M, SIMHYD, and TANK models were used to evaluate the effectiveness and applicability of integrating GCR and LHMs.These models have distinct structures and complexity and are widely used worldwide.Diagrams of the structure of four LHMs are shown in Figure 1, and model parameters are listed in Table 1.
The inputs of the models include precipitation (P) and E pa , outputs are Q and E. This study is conducted on a monthly scale.

Australian Water Balance Model (AWBM)
The AWBM is developed from saturation runoff generation concepts and is one of the most widely used hydrological models in Australia (Boughton, 2004).AWBM has eight parameters.Figure 1a shows the model structure.
The AWBM has three individual surface storages and a baseflow storage with unlimited capacity.The three individual surface storages have different area ratios (A 1 , A 2 , and A 3 , satisfying A 1 + A 2 + A 3 = 1) and capacities (C 1 , C 2 , and C 3 ).Surface storage is replenished by P and depleted by E. When the water supply is unlimited, E = E pa ; otherwise, E < E pa and varies with S and P. The excess water (EXC) of surface storage is partitioned into routing and baseflow storage depending on a single parameter baseflow index (BFI).Total Q is the sum of surface flow (Q s ) and baseflow (Q b ), where Q s and Q b are linearly related to water storage, with a surface flow recession constant of K s and baseflow recession constant of K b (Boughton & Chiew, 2007;Yu & Zhu, 2015).

GR2M Model
The GR2M model is a monthly LHM (Mouelhi et al., 2006) developed from a four-parameter daily rainfall-runoff model (GR4J).GR2M has only two parameters (X 1 and X 2 ); the model structure is shown in Figure 1b.The model has been successfully used in many worldwide catchments with significant climate diversity (Demirel et al., 2013).In GR2M, Q is non-linear with the water content of production and routing storage, and the routing storage capacity is set at a fixed value of 60 mm.Soil moisture storage (S) is replenished by P and depleted by E. Due to precipitation, S becomes S 1 , where S 1 is a non-linear function of S, P, and X 1 .Due to E, S 1 becomes S 2 , where S 2 is a non-linear function of S 1 , E pa , and X 1 .The difference between S 1 and S 2 within a time step is E.

SIMHYD Model
The SIMHYD is a simplified version of the LHM with seven parameters (Chiew et al., 2002(Chiew et al., , 2003)).SIMHYD has been widely investigated across diverse catchments.The model considers infiltration excess Q and saturation excess Q and contains three stores for interception loss, soil moisture, and groundwater, as shown in Figure 1c.
LEI ET AL.Zhang et al., 2008).The last two components are modeled as a linear function of the ratio S and SM.Note that the SIMHYD used in this study is the simple version without a confluence module.

TANK Model
The TANK model was originally developed and introduced by Sugawara (1967) and has been widely used worldwide.The model structure is very flexible and can be customized for different numbers of tanks, water outlets, and storage and drainage capacity sizes.The TANK model in this study has two tanks, that is, surface tank and base tank, which are in linear vertical series, as shown in Figure 1d.The surface tank has two horizontal outlets on the side and a vertical outlet at the bottom; the base tank has one horizontal outlet.Water through the side outlets forms Q s , Q I , and Q b (from top to bottom), and the sum of the three is total Q.Infiltration is represented by flows between tanks through the bottom outlet.The TANK model assumes that flows and infiltration are linear functions of water height above outlets (Sugawara, 1967(Sugawara, , 1995)).E = E pa if water is sufficient; otherwise, E < E pa is limited by surface tank water storage and precipitation.The model has eight parameters and can be divided into three categories: (a) outlet coefficients (a 1 , a 2 , a 3 , a 4 ); (b) storage heights of horizontal outlets (h 1 , h 2 ); and (c) maximum storage heights of tanks (H 1 , H 2 ; Lee et al., 2020).

Generalized Complementary Relationship
The GCR method estimates actual E based on a relationship between three types of E, that is, actual E, potential E (E po ), and E pa .The relationship can be summarized as follows: a decreasing E will lead to an increasing E pa by alteration of the overlying air temperature and humidity (Brutsaert, 2015), which is non-linear and asymmetric.
In the past decade, four typical asymmetric nonlinear complementary relationships (shown in Table 2) have been developed based on different boundary conditions and mathematical forms.
The E pa is E from a small, saturated surface within a large, normally drying surface; it can be observed by the evaporating dish (Kahler & Brutsaert, 2006).However, E pan is difficult to obtain.This study estimated E pa using the Penman (1948) equation, as in Brutsaert et al. (2020): The complementary deviations of E and E pa are from E po , which is potential E under the condition of an adequate water supply in the area, or E from wet regional surfaces.Note that this study adopts the nomenclature of Brutsaert (2015).E po is difficult to estimate using only the information and observations of ambient conditions (Brutsaert et al., 2020), and adopt a more flexible form, that is, α c E e , in which α c is an adjustable parameter, and E e is equilibrium evaporation and can be approximated using the method proposed by Slatyer and Mcllroy (1961):  Han and Tian (2018).In power function, X = w i E w /E pa , E w is wet-environment evaporation rate, and w i is wetness index.More detailed descriptions can be found in Szilagyi and Crago (2023).For S1, it has an adjustable parameter b, and α PT is expressed as a function (f) of the estimated wet-environment air temperature (T w ).For S0, it does not have any parameter as α PT = f(T w ) and b ≡ 2.

Table 2
The Different Functions of Generalized Complementary Relationship where Δ (kPa°C −1 ) is the slope of the saturated vapor pressure curve at a given air temperature; γ (kPa°C −1 ) is the psychrometric constant; R n is net radiation at the land surface; G is the soil heat flux and can be ignored due to the value being much less than R n at a daily or much longer time scale; R n and G are expressed in water depth (mm day −1 ); D (hPa) is the saturated water vapor pressure deficit; u 2 (m/s) is the mean wind speed at 2 m above ground; and f (u 2 ) is a wind function.The performance of Equation 1 using average values of atmospheric variables is insensitive to f (u 2 ) in the context of the complementary approach (Liu et al., 2018;Zhang et al., 2017), thus the Penman wind function is adopted in this study: The polynomial function has one-parameter and other three functions (i.e., sigmoid, exponential, and power) have two-parameters.For power function, it has one adjustable parameter b when α PT (the constant in the original Priestley-Taylor equation) is expressed as a function of the wet-environment air temperature (T w ); it does not have any parameter if α PT = f (T w ) and b ≡ 2. The parameter α c is similar to α PT but is fundamentally different in physical meaning.As these free parameters are predictable and flexible at different timescales (Wang et al., 2021), the complementary functions can be applied at multiple temporal timescales from hourly to interannual.The polynomial (B15), sigmoid (H18), exponential (G21), and power with no-parameter (S0), one-parameter (S1) and two-parameter (S2) functions of GCR are adopted in this study.

Integration of LHMs and GCR
The integration of GCR and LHMs by modifying E estimation differs among models according to distinct model structures.E-estimation methods of LHMs and GCR-LHMs are shown in Table 3.The input of the E module in LHMs is E pa , while that of GCR-LHMs is E pa and E e or E w .In AWBM and TANK models, E directly depletes from P and S (linear subtraction).E = E pa when the water supply is unlimited; otherwise, E < E pa .For the GR2M model, E is the non-linear function of S, P, E pa , and X 1 .GCR-AWBM, GCR-GR2M, and GCR-TANK replace E pa with E GCR (E calculated by the GCR method).Regarding the SIMHYD model, E is the sum of INE and E s .While in GCR-SIMHYD, the interception submodule is abandoned.Moreover, the integrated models do not require additional input data.

Model Calibration and Evaluation
The Genetic algorithm (Whitley, 1994) is adopted to optimize the parameters of original LHMs and integrated GCR-LHMs.For GCR-LHMs, the GCR models are calibrated together with LHMs against Q observations.The   Kling-Gupta coefficient (KGE) is a parameter optimization and model evaluation fitness function.KGE can simultaneously quantify the consistency, deviation, and variability of modeling (Gupta et al., 2009).For all catchments, the first year is used as a warm-up period, and the rest, 2/3 and 1/3 data are used for calibration and validation, respectively.The Equation of the fitness function is as follows: where r is the Pearson correlation coefficient between simulated and observed series; θ is the standard deviation ratio; and β is the bias ratio; Y obs and Y sim are observed and simulated values, respectively; variables with overbar denote average value; σ obs and σ sim are standard deviations for observation and simulation series, respectively; and cov is the covariance between observation and simulation.The KGE can vary from −∞ to 1.The closer the KGE approaches 1, the better the model performs.When KGE = 1, the simulated series is the same as the observed series.
This study selects R-Square (R 2 ), root mean square error (RMSE), and KGE SRM criteria to quantify model performance along with KGE.The R 2 is the square of Pearson correlation coefficient (r), representing the fit goodness between the simulated and observed series, varying from 0 to 1.When R 2 = 1, the model can fully explain all changes of observed series.The RMSE measures deviation between simulated and observed series and is sensitive to outliers, the smaller RMSE indicates the better performance.The parameters of distinct LHMs and corresponding GCR-LHMs are inconsistent, but complex models with more parameters are more flexible in fitting calibration data.The KGE SRM is a modified form of KGE criteria.It eliminates the influence of parameter numbers on model performance by applying a model complexity control method of structural risk minimization (SRM; Schoups et al., 2008).A smaller KGE SRM means a better estimation of the hydrological variables.The assessment criteria R 2 and KGE SRM are calculated as follows: where k is the number of parameters to measure model complexity, N is the length of the hydrological observation series.

Catchments and Data
This study identified 2112 catchments for which reliable Q data are available, allowing performance evaluation of LHMs and GCR-LHMs.The locations of these catchments are shown in Figure 2.These catchments have at least 10-year consecutive runoff records without missing a value from 1971 to 2020.Furthermore, they all satisfy long-term water balance, and the annual mean Q, E pa , and P agree roughly (±30%) with the Budyko framework (Fu, 1981;Zhang et al., 2004), as shown in Figure S1 in Supporting Information S1.According to the Köppen-Geiger climate classification (Kottek et al., 2006), these 2112 catchments are located in four typical climates, with the largest number of catchments (1,275) in the warm temperate climate zone, followed by snow (488), equatorial (276), and arid climates (73), as shown in Figure 2b.Catchments with polar climates were not collected due to insufficient numbers for meaningful analysis and interpretation.The runoff data of the 2112 catchments were obtained from multiple publicly available data sets, including the Global Runoff Data Center Both LHMs and GCR-LHMs are driven by the same global meteorological data from the Climate Research Unit and Japanese reanalysis version 2.2 (CRU-JRA v2.2) forcing data sets (Harris et al., 2014(Harris et al., , 2020;;Kobayashi et al., 2015), including air temperature (T), vapor pressure, downward shortwave radiation, downward longwave radiation, and wind speed with a spatial resolution of 0.5°.The basin-scale times series of meteorological variables are aggregated from the gridded data using the area-weighted average method, considering the fractional weight of the boundary grid.E pa and E e, as model inputs, are derived from Equations 1 and 2 using these meteorological data.
To investigate differences in model performance between catchments, catchment characteristic variables were collected, including drainage area (AREA) and normalized vegetation index (NDVI).AREA was provided in the CAMELS and GRDC data sets and was used directly.The NDVI is a multi-year mean value during 1982-2015, obtained from global inventory modeling and mapping studies data sets (GIMMS; Tucker et al., 2005;Zeng et al., 2013).Note that multi-year mean values of NDVI were adopted to characterize the vegetation conditions of catchments, so the coincident time series of NDVI and hydrometeorological data was not required.The attribute characteristics and hydrometeorological characteristics of all catchments had great diversity.The area ranged from 4.32 to 2.9 × 10 6 km 2 , and the mean annual P and Q varied between 37 and 3,502 mm and between 0.5 and 2,657 mm, respectively.E pa was 382-2,458 mm, and the aridity index (AI = E pa /P) varied between 0.2 and 8.9.NDVI had a range of 0.0-0.84.

Performance of Original Models
The performances of the four LHMs for Q simulation in all 2112 study catchments are shown in Figure 3. LHMs can reproduce most catchments of the monthly observed Q series.The median KGE of AWBM, GR2M, SIMHYD, and TANK models is 0.39, 0.71, 0.67, and 0.63, respectively (Figure 3i).The correlations (R 2 ) between the observed and simulated series are lower than KGE, with 0.38 (AWBM), 0.65 (GR2M), 0.63 (SIMHYD), and 0.47 (TANK; Figure 3j).The GR2M model has the best median KGE and R 2 performance, followed by the SIMHYD, TANK, and AWBM.Performance has different spatial distributions among models.AWBM performs better in South America and Australia and relatively worse in North America and Europe (Figures 3a and 3b).GR2M and SIMHYD perform well in most catchments and poorly only in the northern regions of North America (Figures 3c-3f).TANK performs better in South America and Western Europe and relatively worse in North America and part of Australia (Figures 3g and 3h).Basically, LHMs perform better in South America and Europe and poorly in North America.
and SIMHYD models have median KGE of all different climates (Figures 4b and 4c).But AWBM and TANK perform poorly in snow and arid climates, with negative median KGE (Figures 4a and 4d).Model robustness is highest in the equatorial climate with the inter-quantile range of 0.24 ± 0.02 for KGE, followed by warm temperate climate (0.28 ± 0.04), snow climate (0.32 ± 0.07), and arid climate (0.35 ± 0.05).Overall, these results indicate that the applicability and robustness of LHMs vary noticeably across catchments and climates.Therefore, further improvement is still necessary for wider applicability and higher robustness of LHMs.

Performance of Integrated Models
Figure 5 compares the performance of LHMs and GCR-LHMs in simulating Q. Principally, the GCR-LHMs outperform the LHMs, except for five models (i.e., S0-AWBM, S0-GR2M, S0-SIMHYD, S1-AWBM and S2-SIMHYD) in part of study catchments, amongst 24 GCR-LHMs.The improvement is reflected in the higher position of KGE lines of GCR-LHMs on those of LHMs (in Figure 5) and higher median values (inserted boxplots in Figure 5).The degree of improvement varies across catchments, models, and functions.For AWBM, GR2M and TANK models, the improvement are more obvious for smaller KGE (Figures 5a, 5b and 5d), while improvement for the GCR-SIMHYD are more obvious within the KGE range of 0.5-0.8(Figure 5c).The GCR-AWBM model has the greatest improvement in terms of median KGE, followed by GCR-SIMHYD, GCR-TANK, and GCR-GR2M.It is noted that GCR-LHMs performance with S0 function is marginally worse than LHMs for three of four LHMs.This suggests that parameter b in power functions has non-negligible effects on model performance and local calibration is recommended when integrating with LHMs.Basically, CR-based non-linear models are promising to be integrated with LHMs and have similar performance.Relatively, performance of B15-LHMs is slightly better than that of other GCR functions with different LHMs.Thus, results of B15-LHMs only are presented and discussed hereinafter to demonstrate the advantages by integration of GCR functions with LHMs.
LHMs and GCR-LHMs in terms of percentage of catchments with KGE > 0.5 (i.e., acceptable catchment in performance) and ΔKGE > 0 (i.e., improved catchment in performance) during calibration, validation, and entire study periods are summarized in Table 4.The percentage of catchments with KGE > 0.5 of four GCR-LHMs are all higher than that of LHMs, which suggests that the integration improves model applicability.The mean increases for AWBM, GR2M, SIMHYD, and TANK four models are 11.4 ± 4.0%, 7.3 ± 1.9%, and 10.7 ± 3.6% in calibration, validation, and entire study periods, respectively.For the percentage of improved catchments, the mean values of the four models are 80.9 ± 4.1%, 65.0 ± 5.3%, and 77.7 ± 5.0% in the three periods, respectively.In which, the improvement of GR2M is the maximum (84.3 ± 3.5%), while TANK is the minimum (65.3 ± 3.7%).The improved catchments are all more than 50% regardless of models or periods, that is, the GCR-LHMs improved Q simulations for most of the study catchments.In summary, outperform LHMs regarding the median KGE and percentages of the acceptable and improved catchment, directly indicating that GCR-LHMs are more reliable and accurate in Q simulation for most catchments.The performance of LHMs integrating GCR still varies with climates.Performance and improvement of GCR-LHMs have apparent spatial variability (between-catchment) and model variability (between models).The integration of GCR improves the simulation ability and enhances the applicability of LHMs.

Dependence of Model Performance on Catchment Characteristics
The spatial variability in the performance of LHMs and GCR-LHMs (Figures 3, 4, and 6) may depend on catchment characteristics.Four catchment characteristics (i.e., AREA, AI, T, and NDVI) are focused on this aim.Figure 7 summarizes the linear correlation (r) between the four characteristic variables and model performance.
As for LHMs (Figure 7a), the four catchment characteristics have significant correlations (p-value < 0.05) with performances of all four models.The correlation of AI with model performance is highest (−0.44 ± 0.04), followed by that of NDVI (0.30 ± 0.03), T (−0.14 ± 0.03), and AREA (−0.10 ± 0.03).Note that in subsequent results/discussions, unless otherwise stated, the significance level is p-value <0.05.For GCR-LHMs (Figure 7b), AI and NDVI significantly correlate with model performance, with r of −0.43 ± 0.05 and 0.40 ± 0.02 (NDVI), respectively.AI and NDVI have stronger correlations with the model performance of LHMs and GCR-LHMs than the other factors.AI has a negative correlation with model performance, while NDVI is positive.Therefore, LHMs and GCR-LHMs perform poorly in arid or poor-vegetated regions.Note that the r between T and model performance of LHMs and GCR-LHMs are opposite, but some of the latter are is insignificant.
These linear correlations between model performance change before and after integration (i.e., ΔKGE) and characteristic variables are almost insignificant (Figure 7c).Furthermore, these correlations are inconsistent amongst models, with the significant relationship between KGE and T being positive in AWBM (0.07) and negative in GR2M (−0.07), with the significant relationship between KGE and AI being negative in SIMHYD (−0.13) and positive in other three models.That is, the role of integration has no definite relationship with a certain variable.Results showed that ΔKGE of different catchments does not depend on catchment characteristics, while the performances of LHMs and GCR-LHMs show apparent dependence on certain catchment characteristics.It indicates that integration of GCR into LHMs can generally improve the model performance in terms of capability and applicability.

Performance in Water Balance
The water balance elements (e.g., Q and E) affect each other in the simulation process of rainfall-runoff models based on the water balance principle (Jaskierniak et al., 2019).In this study, the integration of GCR replaces the over-simplified E estimation method with a more rigorous physical mechanism to obtain a more accurate simulated E as a potential expectation.In the water balance, P is mandatory to input, and if Q and E are accurately estimated, ΔS as the residual term is also accurate.Due to the lack of E observations at same time scale as runoff, 10.1029/2023WR035567 of the water-balance E values (i.e., E wb = P − Q) are derived to compare the performance of LHMs and GCR-LHMs in simulating E on a multi-year mean basis.As shown in Figure 8, GCR-AWBM, GCR-GR2M, GCR-SIMHYD, and GCR-TANK outperform corresponding LHMs for E simulation, reflecting in higher R 2 and smaller RMSE between simulated and observed series.Amongst models, the improvement is clear in GCR-AWBM and GCR-SIMHYD, with R 2 increasing by 0.24 and 0.41, RMSE decreased from 174 to 61 and 248 to 34 mm, respectively; while marginal in GCR-GR2M and GCR-TANK, with R 2 increasing by only 0.01, RMSE decreased from 47 to 39 and 45 to 39 mm, respectively.
Basically, the improvement in simulating Q and E are synchronous for GCR-LHMs, and the Q improvement is greater than E (Figures 6 and 8).Two reasons mainly cause this; one is that LHMs focus on rainfall-runoff relationships and less consider the soil moisture content recharge and dissipation processes closely related to E estimation.The LHMs integrate an evaporation theory based on more physical mechanisms to improve E simulation, further improving Q simulation through basin water balance relationships.The other reason is that parameter optimization considers only runoff processes and hardly reflects the different aspects of model simulation (Adeyeri et al., 2020;Rajib et al., 2016), resulting in a greater improvement of Q than of E simulation.In short, improvement of Q and E simulation capacity appears in most catchments.These improvements have spatial variability (between catchments) and model variability (between models), which are caused by distinct model structures and model parameters (Hughes et al., 2014;Jayathilake & Smith, 2020).The consistent improvement and more balanced performance of simulating Q and E prove the rationality of integrating GCR.Overall, the integration of GCR effectively improves the water balance modeling capability of LHMs.

Regional Variability of Model Performance
The regional variability in the performance of LHMs and GCR-LHMs relates to climatic types (Figures 4 and 6) and catchment characteristics (Figure 7).LHMs and GCR-LHMs perform well in warm temperate and equatorial climates, and poorly in arid and snow climates.The results are consistent with Zhang et al. (2016) and may be caused by two reasons; one is that most LHMs are developed in humid or semi-humid regions (Gudmundsson et al., 2012) and assume a linear relationship between E and E pa .In humid regions, E is mainly limited by energy and exhibits a strong linear relationship with E pa .While in arid and semi-arid regions, E is mainly limited by water availability and does not have a good linear relationship with E pa but a better relationship with P (Cheng et al., 2011).E in the arid climatic zone does not satisfy this hypothesis and thus performs poorly.The other reason is that the snowfall-snowmelt module is absent in the hydrological models used in this study, and thus both LHMs and GCR-LHMs perform poorly in snow climate zones.It is noted that GCR-LHMs outperform LHMs regardless of climatic type, but still cannot reproduce Q dynamics in a snow climate, which can be further improved by incorporating a snowfall-snowmelt module in the future.
The correlations between GCR-LHMs performance and catchment characteristics are more reasonable than those of LHMs.LHMs and GCR-LHMs ignore the spatial heterogeneity of catchments and can hardly reflect the hydrological processes of large catchments.The dependence of GCR method on spatial scales is smaller than that of LHMs, so the correlation between GCR-LHMs performance and AREA are less than LHMs.As for T reflecting energy-driven indirectly, it is negatively correlated with the performance of LHMs but is a significant positive correlation with that GCR-LHMs.The latter is consistent with the recognized variation of hydrological model performance from snow to warm climatic types and thus is considered more reasonable and can be regarded as justification for integration.For NDVI, GCR and LHMs well perform in simulating E and Q for well-vegetated catchments, respectively, and it is reasonable that GCR-LHMs performance is more highly correlated with NDVI.
Regarding model performance change before and after integration (i.e., ΔKGE), it does not depend on any catchment characteristics (Figure 7c).The main reason is that the GCR explains the pervasive coupling relationships between land surface E and atmospheric state (Brutsaert, 2015;Brutsaert et al., 2020) and can reliably and independently estimate E of different catchments in different climatic zones (Li et al., 2020;Liu et al., 2016;Zhang & Brutsaert, 2021).Thus, integrating GCR into LHMs improved capability in water balance modeling, extensively enhanced performance, and did not depend on catchment characteristics.

Influence of Catchment Vegetation Coverage on Runoff Simulation
LHMs and GCR-LHMs perform better in catchments with greater vegetation coverage (Figures 7a and 7b).Taking four catchments with vegetation coverage gradients as examples, that is, 4235191_GRDC (NDVI = 0.38), 101005_GB (0.59), 6503851_GRDC (0.71), and 15930000_BR (0.83), the influence of vegetation coverage on simulating Q are analyzed and shown in Figure 9.The simulating Q of well-vegetated catchments (i.e., 6503851_GRDC and 15930000_BR) have the same variability as the observation, that is, both increase and decrease simultaneously and peak at the same time.This can be attributed to two reasons; one is that vegetation coverage changes the rainfall-runoff relationship by enhancing infiltration and reducing flow and velocity (Tang et al., 2021).As a result, runoff processes in well-vegetated catchments are more regular and stable, and LHMs with non-time-varying parameters perform better.The other reason is the limitation of the simulation capability of LHMs and that the estimation of median runoff is more accurate than low or high runoff.And greater vegetation coverage produces greater runoff regulation, further achieving better runoff simulation.
Essentially, vegetation coverage can be considered a surrogate of climatic conditions, and greater NDVI is expected to have higher E (Chen et al., 2013).Runoff processes are driven by P and E events.GCR-LHMs successfully capture runoff variability and produce smaller magnitude differences by integrating the GCR method.This advantage is more prominent, especially in catchments where E accounts for a large proportion of water losses.The closer relationship between model performance and NDVI in GCR-LHMs is a precise contribution of E improvement, directly indicating that the integration of GCR is correct and appropriate.

Influence of Model Complexity on Performance
The GCR describes the non-linear exhibition of complementary behavior of E and E pa based on common observations and has developed different typical functions with 0-2 parameters.The same validity of these typical functions has been demonstrated (Gao & Xu, 2021;Szilagyi et al., 2022).Thus, considering marginally poor performance of GCR-LHMs with no-parameter S0 function shown in Section 4.2, the single-parameter function proposed by Brutsaert (2015) is adopted in this study to maintain the simplicity of LHMs and better performance as much as possible.GCR-AWBM, GCR-GR2M, and GCR-TANK models increase one parameter more than their corresponding LHMs.Schoups et al. (2008) pointed out that more parameters lead to more flexible model applications.The extensive improvement in simulating Q is mostly attributed to improved model structure and increased model parameters have a secondary influence as indicated by the KGE SRM criterion.This criterion eliminates the influence of parameter numbers on model performance by applying a model complexity control method of structural risk minimization (SRM; Schoups et al., 2008), with smaller KGE SRM indicating better performance.
The percentages of catchments with ΔKGE SRM < 0 amongst the four models are greater than 50%, with 79.4 ± 4.3%, 62.9 ± 4.9%, and 80.6 ± 3.9% for calibration, validation, and entire study periods, respectively.Excluding the effect of more flexible applications of LHMs due to added parameter numbers, integration still brings improvements in simulating Q of most catchment.In general, model structure adjustments by integrating GCR are reliable for improving simulated Q.The integration does not markedly increase parameter numbers and change model structure.In addition, LHMs used E pa to input the evaporation submodule, while GCR-LHMs used E pa and E e or E w , derived from common meteorological data.GCR-LHMs retain the advantages of LHMs, including computational efficiency, simplicity of model structure, and readily available inputs (common hydrometeorological data).This is the most important success of this study compared with other attempts (Deng & Sun, 2012;Yuan et al., 2008) to improve the capability of LHMs.

Conclusions
The LHMs are designed to simulate Q with relatively simple model structures.However, E, one of the key hydrological processes, is commonly treated empirically in the LHMs, limiting their general applicability.This study incorporated the GCR method into the LHMs to improve their E representations.Results showed that, basically, the integrated models with different typical GCR functions outperformed the original models in most studied catchments with a broad variety of climatic and vegetation conditions.The specific differences amongst integrated models with different GCR functions will be further investigated in future.Furthermore, the improvement in the model performance is independent of the catchment characteristics, suggesting that incorporating the GCR the is appropriate.One of the of incorporating GCR into the LHMs is that it improves the model's accuracy and applicability while retaining model simplicity and only using readily available input data.This study demonstrates the potential of the GCR in hydrological modeling, and it can help us better understand catchment water balance partitioning to provide more realistic estimates of both Q and E.

Figure 1 .
Figure 1.Diagram for structure of lumped hydrological models.

Figure 2 .
Figure 2. (a) The locations of study catchments, background colors represent climatic zones according to Köppen-Geiger climate classification; and (b) The number of catchments in different climate zones.

Figure 3 .
Figure 3. Performance in Q simulation of lumped hydrological models.Subplots in the left and right panels show model performance evaluated by KGE and R 2 , respectively.The top panels are AWBM, followed by GR2M, SIMHYD, and TANK.Red points and blue points in subplots (a-h) represent catchments with KGE > 0.5/R 2 > 0.5 and KGE < 0.5/R 2 < 0.5, respectively.In subplots (i, j), the left, middle, and right of the box represent the 25th (Q1), 50th (Q2), and 75th (Q3) percentiles, respectively; the right and left ends of error bars are cutoff points of outliers, which are Q3 + 1.5 inter-quantile range (IQR) and Q1 − 1.5 IQR, respectively.IQR is the interquartile range, that is, Q3 − Q1.Notice that KGE < 0 is marked as 0 for better visualization.

Figure 5 .
Figure 5.Comparison of Q simulation performance of lumped hydrological models and GCR-LHMs in all study catchments.Subplots (a-d) show exceeded percentage of catchments that KGE of (a) AWBM, (b) GR2M, (c) SIMHYD, and (d) TANK.Inserted boxplots summarize the overall performance of different models, the bottom, middle, and top of the box represent the 25th (Q1), 50th (Q2), and 75th (Q3) percentiles, respectively; the upper and lower ends of error bars are cutoff points of outliers, which are Q3 + 1.5 inter-quantile range (IQR) and Q1 − 1.5 IQR, respectively.IQR is the interquartile range, that is, Q3 − Q1.Note that KGE is truncated by 0 for better visualization.

Figure 7 .
Figure 7. Linear correlations of catchment characteristics with lumped hydrological models performance (subplots (a), evaluated by KGE), GCR-LHMs performance (subplots (b), evaluated by KGE), and model performance change before and after integration (subplots (c), evaluated by ΔKGE, that is, KGE GCR-LHM − KGE LHM ).Blue indicates negative correlation and red indicates positive correlation, with darker colors reflecting higher correlation.Note that numbers are the Pearson correlation coefficient (r), and those without numbers fail p-value < 0.05.

Figure 8 .
Figure 8. Performance of lumped hydrological models (LHMs), and GCR-LHMs with B15 function in simulating E. The blue represents performance of LHMs and the red is that of GCR-LHMs.The black solid lines are 1:1 fitting lines.

Table 1
The Parameters of Four LHMs (INE), plant transpiration and soil evaporation (E s ; E po /E pa , y = E/E po .In sigmoid function, x' = E e /E pa , m and n are functions of parameter α PT and c.More detailed descriptions can be found in

Table 3
Evapotranspiration Estimation Methods of LHMs and GCR-LHMs

Table 4
Model Performances in Terms of Percentage of Catchments With KGE > 0.5 and ΔKGE > 0 LEI ET AL.