Hybrid Theory‐Guided Data Driven Framework for Calculating Irrigation Water Use of Three Staple Cereal Crops in China

Current irrigation water use (IWU) estimation methods confront uncertainties warranting further attention, primarily stemming from constraints within model structure and data quality. This study proposes a hybrid framework that integrates multiple machine learning (ML) methods with theory‐guided strategies to calculate IWU for three principal cereal crops within the Chinese agricultural landscape. We generated high resolution time series data sets of evapotranspiration and surface soil moisture (SM) using remote sensing resources. ML techniques, along with the Bayesian three‐cornered hat ensemble, were employed to drive multiple remote sensing‐derived data sets in IWU calculation. We applied two theory‐guided mechanisms to quantify irrigation signals: first, converting original SM values into logarithmic terms, and second, extracting process‐based SM residuals. Proposed framework has been validated at 12 field stations across China, yielding coefficient of determination (R2) ranging from 0.54 to 0.70, and root mean square error (RMSE) spanning 278–335 mm/yr. Our framework demonstrates considerable strength in IWU estimation when compared to reported IWU values form 341 cities across China. Specifically, for rice, wheat, and maize, the R2 values range from 0.78 to 0.83, 0.68 to 0.76, and 0.53 to 0.64, respectively, with corresponding RMSE measuring 0.22–0.25, 0.10–0.12, and 0.11–0.13 km3/yr, respectively. These findings highlight the effectiveness of theory‐guided strategies in discerning irrigation‐related information, thereby improving overall model performance. Attention should be directed toward the uncertainties in evapotranspiration and precipitation products on model performance, which remained modest, with a relative change of less than 5%.


Introduction
Irrigation is a critical agricultural practice essential for crop growth and enhancing grain production.It also accounts for over 70% of the world's freshwater withdrawal (Siebert et al., 2015).In China, irrigated croplands constitute approximately 50% of the total cropland area but contribute significantly, contributing to about 75% of the nation's total grain output (C.Zhang et al., 2022).This highlights the substantial impact of irrigation on China's agricultural landscape and places considerable demands on its limited freshwater resources (Kang et al., 2017;Sun et al., 2015).Recent studies suggest that unsustainable water use, where water consumption exceeds local renewable capacity, has accounted for 66% of China's irrigation water use (IWU) (Rosa et al., 2019).Moreover, as irrigation involves direct human intervention in the natural water cycle, adopting unsound irrigation practices can detrimentally affect the regional ecological environment.These practices can lead to adverse consequences such as groundwater depletion and land salinization (W.Liu et al., 2022;J. Wu et al., 2013).Therefore, accurate IWU estimation is pivotal not only for sustainable water resources management and food security but also for assessing the profound environmental impact associated with human irrigation practices (Fisher et al., 2017).
The conventional approach of obtaining IWU relies heavily on statistics provided by irrigation district management departments, which demand considerable manpower and material resources due to the diversity of irrigation schemes, infrastructure, and crop types across regions (X.Li et al., 2020).Recent advancements in satellite sensors have facilitated the use of remote sensing technology to obtain irrigation-related variables such as evapotranspiration (ET) and precipitation, offering a potential avenue for large-scale IWU estimation (Agha-Kouchak et al., 2015;Y. Chen et al., 2019).Various approaches have emerged to estimate IWU.Brocca et al. (2013) proposed the SM2RAIN algorithm, which calculates the total water entering irrigated farmland based on the water balance equation and then estimates IWU by subtracting local precipitation.However, reliance on rainfall data presents reliability challenges at a large scale.S. V. Kumar et al. (2015) and Zaussinger et al. (2019) integrated microwave soil moisture (SM) products and modeled SM products without an explicit irrigation scheme to capture irrigation signals through product residuals.Yet, limitations such as coarse resolution and data gaps in SM products constrain the effectiveness of this method.Moreover, these theory-based methods often rely on energy balance models or water balance models to quantitatively describe the complex relationships between IWU and related variables, leading to significant uncertainty in IWU estimation.Recent researcher has proposed methods to mitigate uncertainty by integrating multisource products (K.Zhang et al., 2022;Zohaib & Choi, 2020), but these theory-based methods remain constrained by model limitations and data gaps.
Machine learning (ML) stands as a promising avenue for IWU estimation, offering an alternative to theory-based methods by leveraging historical data to discern underlying patterns and rules, rather than explicitly modeling physical relationships between variables (Reichstein et al., 2019).The application potential of ML in hydrology and agriculture has been extensively explored.For example, as early as 1995, the effectiveness of Artificial Neural Network (ANN) in simulating rainfall-runoff processes was demonstrated (Hsu et al., 1995).Similarly, other ML methods such as Random Forest (RF) have been utilized in hydrology and agriculture to predict various factors such as yield, fertilizer requirements, and pesticide usage (Everingham et al., 2016;Sherafatpour et al., 2019).Despite these advancements, there has been limited exploration of ML's potential for IWU estimation.One challenge is selecting the most suitable ML model, given the multitude of models developed over decades, each with specific advantages.One potential solution is integrating results from different ML approaches.For instance, Fleming et al. (2021) employed multi-model averaging to integrate water supply forecasts in the western US from six different supervised learning algorithms, demonstrating the benefits of multi-model ensemble techniques.In addition to multi-model averaging, Bayesian approaches are increasingly recognized for their ability to harmonize information from observations and diverse outcomes generated by various ML models (Zounemat-Kermani et al., 2021).Within the Bayesian framework, this amalgamation produces a probability distribution of ensemble members, offering a perspective for multi-model ensemble.Furthermore, the aforementioned theory-based IWU estimating approaches provide several theory-guided knowledges that reflect irrigation information.While ML is primarily data-driven, incorporating these theory-guided knowledges into ML models may enhance their effectiveness (Karpatne et al., 2017).
On the other hand, high-quality data sets with high spatiotemporal resolution are required for all existing methods.However, current remote sensing products could not fully meet these requirements due to factors such as satellite orbit and sensor performance (Dorigo et al., 2010).Prevalent surface soil moisture (SSM) products such as Soil Moisture Active Passive (SMAP) and ESA CCI suffer from coarse spatial resolution and data gaps, leading to considerable uncertainty in IWU estimation (Y.Y. Liu et al., 2012).Consequently, there is a pressing need to explore effective data filling methods to enhance the usability of these SSM products (K.Liu et al., 2023).Similarly, data sets associated with ET models, such as MODIS (Moderate Resolution Imaging Spectroradiometer), pose their own challenges.Data gaps and the temporal synthesis of provided data at 8-day intervals present substantial obstacles for methods reliant on daily data (Cammalleri et al., 2013).To address these constraints, there is growing recognition of the value of process-based ET models.Models like the Priestley-Taylor Jet Propulsion Laboratory and Penman-Monteith-Leuning (PML) offer the capability to quantify intricate spatiotemporal dynamics of thermal and water processes (Yao et al., 2013).However, it is essential to note that thermal-based ET models, notably Surface Energy Balance System and surface energy balance algorithm for land (SEBAL), demonstrate remarkable accuracy in irrigation croplands (Ma et al., 2012).Their sensitivity to changes in surface temperature and SM makes them suitable for regions where irrigation significantly alters the surface energy balance (Knipper et al., 2018).Within irrigated croplands, these models excel in capturing the precise impact of irrigation practices on ET, surpassing the capabilities of some conventional methods.Given these considerations, addressing the demand for higher-quality data sets related to irrigation variables emerges as a matter of utmost urgency.
In this study, we propose a hybrid framework that integrates data-driven models with theory-guided strategies to calculate China's IWU for three staple cereal crops.Leveraging high-resolution time series remote sensing data,

Study Region
Our study focuses on the Chinese mainland, a vast region comprising 31 provinces, characterized by complex crop planting structures and cropping intensity (Piao et al., 2010).The staple cereal crops cultivated in this region are wheat, rice, and maize, collectively contributing to over 90% of the total grain output (Luo et al., 2020).Figure 1a illustrates the geographic distribution of these crops.Rice cultivation is predominant in the humid regions of South and Central China, with some cultivation also observed in Northeast China.In contrast, wheat and maize are primarily grown in the relatively arid regions of North China.Additionally, a substantial portion of croplands in these regions practices multi-cropped, such as maize-wheat rotation, which is prevalent in the North China Plain. Figure 1b presents the proportions of reported IWU for three cereal crops across provinces.

Study Design
The proposed hybrid framework for estimating IWU comprises three main components (Figure 2): theory-based processes implementation, data-driven models establishment and integration, and performance analysis.
The primary objective of estimating IWU using ML is feature selection.IWU is defined by the generalized water balance formula as the residual between ET, drainage (D), soil water storage dynamic (dSM), and precipitation (P) for each time step (t) (Y.Chen et al., 2019;C. Zhang & Long, 2021): Applying theory-guided strategies to these variables could yield features that better capture irrigation signals.Part 1 of our framework employs two theory-guided strategies to enhance irrigation information within SM data: converting the original SM data into its logarithmic form (details in Section 3.2) and extracting SM residuals (details in Section 3.3).The outputs of Part 1 are three theory-guided features related to SM: lnSSM, lnRZSM, and soil moisture residuals (ResSM).These features, along with precipitation and ET, constitute all the features utilized in this study.
Subsequently, Part 2 of our framework utilizes these features and reported IWU to drive five ML models and generate corresponding IWU estimations (details in Section 3.4).Furthermore, the Bayesian three-cornered hat (BTCH) approach is applied to integrate the IWU estimations generated by the five ML models (details in Section 3.5).
Part 3 of our framework primarily focuses on performance analysis.To verify the applicability of our proposed framework, we estimated the IWU at field stations based on the same process, but replacing satellite data with field measurements.The uncertainty analysis includes multiple error sources potentially affecting model performance, such as the quality of remote sensing products and the number of training samples.To evaluate the effectiveness of the proposed theory-guided strategies, we designed an ablation experiment to observe model performance without these strategies.We also compared the results between our work and two previous theorybased IWU estimating studies.

Data Set
A variety of remote sensing-derived products were collected to establish and evaluate national IWU, including precipitation, ET, and SM.A distribution and phenology data set were used to filter these remote sensing products temporally and spatially.Among these gridded data sets, SM was initially processed as the average value during the annual phenological period, while precipitation and ET were treated as cumulative values.After temporal filtering, these parameters were spatially accumulated to match the distribution of the three cereal crops.Field measurements from stations were collected to validate the accuracy of these products.Multiple sources of products were introduced to assess the potential uncertainty associated with satellite-derived products on a large scale.Detailed information regarding these data sets can be found in Table 1.

Precipitation
We used the Global Precipitation Measurement (GPM) data set during 2003-2013 as a crucial element in our model.The accuracy and applicability of the GPM data set in hydrological field have been extensively verified  (Tan & Duan, 2017).This half-hourly 0.1°× 0.1°data set, generated using the Integrated Multi-satellite Retrievals for GPM (IMERG) algorithm, combines precipitation estimates from satellite retrieval and ground precipitation gauge data (Hou et al., 2014).
To evaluate the potential impact of product uncertainty on our model, we incorporated two additional precipitation data sets, that is, the Tropical Rainfall Measuring Mission (TRMM) and the new fifth-generation atmospheric reanalysis of ECMWF (ERA5).The TRMM data set, a predecessor of GPM, employs sensors similar to those on the GPM satellite, including the Microwave Imager and Precipitation Radar.Provided at a 3-hourly temporal resolution and a spatial resolution of 0.25°, this data set integrates data from TRMM satellite instruments and external sources, such as Defense Meteorological Satellite Program (DMSP), using the TRMM Multi-satellite Precipitation Analysis (TMPA) algorithm (Pombo and de Oliveira, 2015;Wolff et al., 2007).ERA5, released by the Copernicus Climate Change Service, is the latest reanalysis product with higher spatialtemporal resolution (0.1°and hourly) (Jiang et al., 2021).

Evapotranspiration
Accurate estimation of ET is crucial for efficient agricultural water management, yet current algorithms face challenges at a continental scale, especially across diverse climatic conditions.To address this, we used various satellite observations to produce a 1-km daily ET data set using the SEBAL.The SEBAL model has shown exceptional accuracy and adaptability, requiring minimal ground-based data (Tang et al., 2013).In irrigated croplands, where water application alters the surface energy balance, temperature and moisture variations are pronounced.SEBAL effectively captures these variations by considering key energy components such as incoming solar radiation, outgoing longwave radiation, sensible heat flux, and latent heat flux, crucial for ET estimation.Details on generating the daily ET data set with SEBAL on the Google Earth Engine platform can be found in Laipelt et al. (2021).This study used MODIS data (i.e., surface albedo, surface temperature, and Normalized Difference Vegetation Index (NDVI)) and meteorological data from the ERA5 as model driving input.Land surface temperature (LST) is a crucial parameter for ET calculation in the SEBAL model; nevertheless, it is generally constrained by missing or unreliable pixels.To address this, we applied a spatiotemporal regression strategy to gap-fill missing values in MODIS daily data, according to the studies of K. Liu et al. (2022).Specifically, we employed a RF model to capture the nonlinear relationship between LST and explanatory variables (i.e., NDVI, elevation, slope, aspect and land surface emissivity) and used geographically weighted regression to calibrate the residuals derived from the RF model.
We obtained two additional ET products for verification and uncertainty analysis.The first is MODIS ET, which utilizes a vegetation index model to estimate ET using metrics such as leaf area index (LAI) and meteorological inputs including vapor pressure deficit (Mu et al., 2011).This product provides a global 1 km ET data set with an 8-day temporal resolution.The second product, PML-V2 (Penman-Monteith-Leuning, Version 2), developed by Y. Zhang et al. (2019), integrates a coupled diagnostic biophysical model with the Penman-Monteith equation.It offers a spatial resolution of 500 m and an 8-day temporal resolution, accessible via the Google Earth Engine platform.

Soil Moisture
Satellite-derived SSM products are often hindered by missing observations and coarse spatial resolution (Kovačević et al., 2020;Peng et al., 2023).To address these issues, we generated a time series of daily SSM with full spatial coverage at a spatial resolution of 1-km.Initially, we utilized a proven gap-filling method, as employed in previous studies (K.Liu et al., 2023), to fill missing data within the 25-km ESA CCI SSM product using satellite observations and a ML algorithm.Subsequently, we downscaled the gap-filled SSM data from 25-to 1km resolution using the RF algorithm, integrating remote sensing data sets, digital terrain characteristics, and climatic variables, as described by H. Zhang et al. (2022).
Given that irrigation modules are typically absent in reanalysis products, we obtained SM products from the ERA5 data set to identify irrigation information.The ERA5 SM product comprises three layers (0-7, 7-28, and 28-100 cm), with the first layer considered as SSM and the root zone soil moisture (RZSM) calculated as a weighted average across all three layers.

Distribution and Phenology of Three Cereal Crops
The study utilized phenological data from ChinaCropPhen1km, a data set developed by Luo et al. (2020), providing precise phenological information for rice, maize, and wheat in China.Covering the period from 2000 to 2015, this data set offers a 1 km gridded resolution.It integrates Global Land Surface Satellite LAI products with the Optimal Filter-based Phenology detection approach.The data set includes detailed pixel-level information on key phenological dates, such as heading, intermediate, and maturity dates, expressed in terms of the day of the year.

Reported Irrigation Water Use and Irrigated Area
The city-level reported IWU and the irrigated area during 2003-2013 were sourced from the Chinese Ministry of Water Resources (Zhou et al., 2020).This data set includes annual records of IWU and irrigated area for 341 cities, categorized into various crops, including rice, maize, wheat, vegetables, and fruits.The spatial distribution of the average value and trend of reported IWU for three cereal crops during 2003-2013 can be found in Figure S1 in Supporting Information S1.While this data set is considered reliable for reflecting actual IWU and irrigated area, it is derived through statistical methods and may not perfectly align with actual water amounts due to losses during transmission (K.Zhang et al., 2022).In our modeling approach, we did not directly use reported irrigated area as an input variable.Instead, we relied on ChinaCropPhen1km data for spatial parameter accumulation due to its practical advantages, such as accessibility and real-time availability.The reported irrigated area primarily served to evaluate the accuracy of ChinaCropPhen1km rather than actively influencing our modeling process.

Data Set for Verifying at Field Stations
Field measurements were primarily obtained from two sources, that is, eight stations from the global network of eddy covariance sites (FLUXNET) and 34 stations from the Chinese Ecosystem Research Network (CERN).The geographical distribution of these stations is depicted in Figure S2 in Supporting Information S1.
FLUXNET operates a global network of flux towers that utilize the eddy covariance method to measure energy flux components, including latent and sensible heat (Wilson et al., 2002).We collected half-hourly measurements of latent heat flux from eight FLUXNET stations and converted them into estimates of ET using the energy balance principle.
CERN, established to monitor China's ecological conditions, comprises ecological stations across various ecosystems, such as forests, croplands, and grasslands (Wang et al., 2016).From 34 CERN stations, we obtained SM field measurements, with a measurement frequency ranging from 4 to 10 days.These measurements were acquired using the neutron probe method, a non-invasive technique known for its convenience, precision, and minimal impact on soil structure.Typically, measurements cover multiple soil depths, from 10 cm to approximately 200 cm.SSM was derived from measurements at the 10 cm depth, while RZSM was calculated as the weighted average of depths ranging from 10 to 100 cm.
To assess the performance of the hybrid model, we selected 12 cropland stations within CERN, providing a broader range of field measurements beyond SM.These measurements include irrigation water intensity (unit: mm), ET, and SM from 2008 to 2014.Precipitation data for these stations were obtained from nearby meteorological stations.Further details of these 12 cropland stations are provided in Table 2.

Delineation of Drainage
Drainage, crucial for calculating IWU, depends on the interplay between saturated hydraulic conductivity and soil pore size distribution.The natural logarithm transformation of SM content is a well-established technique in hydrological studies, simplifying the complex relationship between SM and saturated hydraulic conductivity, as described by the Mualem-Van Genuchten equation (Text S1 in Supporting Information S1).
By utilizing the natural logarithm of SM, we linearized the connection between SM and hydraulic conductivity, facilitating the modeling of drainage processes in soils with varied pore size distributions.This approach aligns with previous research, such as Zhao et al. (2010), which identified a significant logarithmic correlation between SM and the soil evaporation transfer coefficient.Additionally, literature on eco-hydrologic dynamics supports our Water Resources Research 10.1029/2023WR035234 approach by highlighting the non-linear relationship between plant root water uptake intensity and SM, consistent with macroscopic root water uptake models (R. Kumar et al., 2013;Shankar et al., 2012).
In practical application, we substituted the original data set with logarithmic representations of surface soil moisture (lnSSM) and root zone soil moisture (lnRZSM).lnSSM was derived from satellite data, while lnRZSM was computed from the ERA5 product, chosen for its superior performance in capturing root zone conditions.To validate the robustness of this transformation, we conducted an evaluation using 7 years of field measurements (2008-2014) from 12 cropland stations.As depicted in Figure 3, the logarithmic forms of SM and IWU reveal a discernible relationship at these field stations.This empirical observation emphasizes that an increase in soil water content moderates the interaction between IWU and SM, particularly in complex soil environments.

Extraction of Soil Moisture Residual Caused by Irrigation
This study introduces a strategy to infer irrigation activities by leveraging a robust metric-the bias between satellite-observed and modeled SM.This bias, stemming from seasonal hydrological variations during dry periods, provides valuable insights into detecting irrigation practices (Zohaib & Choi, 2020).First, we exploited the sensitivity of satellite sensors to large-scale irrigation, enabling the detection of variations in topsoil moisture and identification of irrigation events (Lawston et al., 2017;X. Zhang et al., 2018).Second, we addressed a significant gap in traditional global land surface models, such as the Global Land Data Assimilation System (Rodell et al., 2004) and Modern-Era Retrospective analysis (Woollen et al., 2011), which often neglect the incorporation  of irrigation as a hydrological process.We combined a satellite-derived SM data set with the ERA5 SM product to extract ResSM associated with irrigation practices.
Daily precipitation data was used to determine irrigation periods.We ranked daily precipitation values greater than zero each year in ascending order and identified an appropriate percentile in this ranked data set to delineate the irrigation period (i.e., rain-free period), when irrigation events may occur.Periods below this threshold were classified as non-irrigation periods.To validate the use of precipitation in determining irrigation periods and selecting an appropriate threshold, we collected daily precipitation data for 12 cropland CERN stations, along with precise records of irrigation dates.If daily precipitation on the irrigation day fell below the selected threshold, it indicated that the designated irrigation period covered an irrigation event.The choice of threshold value significantly affects the coverage of irrigation events within the designated period.Higher thresholds encompass more irrigation events but also extend the temporal duration of the irrigation period.In this study, we opted for a threshold ranked at the twentieth percentile, striking a balance between covering a substantial portion of irrigation events (over 80%) and minimizing the rate of increase in irrigation event coverage as thresholds rise (Figures S3 in Supporting Information S1).
ResSM could be theoretically obtained by integrating the difference between the observed data set and the ERA5 product during the irrigation period: where N represents the number of days within the irrigation period.SM irrigation OBA signifies the observed SM product, which may originate from in-situ data or satellite-derived product.SM Calibration ERA5 denotes the ERA5-based SM product after calibration.
Considering the systematic discrepancies between these two data sources, we conducted a calibration process, leveraging the non-irrigation period for alignment.
where SM non-irrigation ERA5 represents the original ERA5 SSM product during the non-irrigation period, while SM non-irrigation OBA and SM irrigation OBA denote the observed SSM products for the non-irrigation and irrigation periods, respectively.The parameters Bias and y were determined through the least squares method.

Establishment of Machine Learning Model
In this study, we utilized five distinct parameters as features to drive a ML model.These parameters, that is, precipitation, ET, lnSSM, lnRZSM, and ResSM, play essential roles in shaping irrigation dynamics from both natural and human perspectives (X.Zhang et al., 2022).Precipitation and ET act as indicators of natural water replenishment and dissipation within croplands.Meanwhile, lnSSM and lnRZSM provide critical information about soil water storage, vital for understanding moisture availability for crops.Particularly, ResSM quantitatively reflects the intensity of human irrigation activities (Zaussinger et al., 2019).
To utilize these features effectively, we employed a suite of five ML models, each with unique strengths: RF, Multiple Linear Regression (MLR), Support Vector Machines (SVM), Extreme Gradient Boosting (XGB), and ANN.RF and XGB, developed by Breiman (2001) and T. Chen and Guestrin (2016) respectively, are ensemble learning algorithms that use bagging and boosting strategies to mitigate data anomalies.SVM, a powerful algorithm, is employed for regression tasks by identifying hyperplanes that minimize prediction error while maximizing the distance between support vectors and hyperplanes (Smola & Schölkopf, 2004).MLR establishes a straightforward linear regression function between the target label and multiple input features, making it easy to implement.ANN consists of interconnected neurons with continuously updated weights, facilitated by a backpropagation algorithm.While ANN excels in handling complex nonlinear relationships, it is often criticized for its Water Resources Research 10.1029/2023WR035234 BO ET AL. reduced interpretability (Elnashar et al., 2020).A more detailed description of these five models can be found in Text S2 in Supporting Information S1.
We first estimated the IWU at field stations to verify the applicability of ML approaches.We iteratively selected one year's field measurements from 2008 to 2014 as the test set, while using the remaining data as the training set (the flowchart can be found in Figure S4 in Supporting Information S1).Cross-validation was then conducted to optimize hyperparameters, with details provided in Table S1 in Supporting Information S1.The performance of our framework was assessed across multiple years.We then extended the ML approaches to estimate IWU for three staple cereal crops at a national scale, focusing on city-level granularity.Similarly, we alternated one year's data from 2003 to 2013 as the test set, using the remainder as the training set.

Ensemble Based on Bayesian Three-Cornered Hat (BTCH) Method
We employed a BTCH approach to construct an ensemble of IWU predictions by combing outputs from the five ML methods mentioned earlier.This method is recognized for its robustness in integrating diverse models seamlessly without requiring any prior assumptions (He et al., 2020).Unlike certain integration methods that rely on measured data, such as Bayesian model averaging, BTCH stands out for its simplicity.It operates solely based on the similarity between predicted sequences from different models, making it a straightforward and effective approach.Moreover, BTCH ensures that ensemble predictions leverage the strengths of individual models while mitigating their respective limitations, optimizing overall predictive performance.
The BTCH method consists of two main components: the classic TCH segment for assessing model uncertainty and the Bayesian segment for weighing each model's contribution.More detailed information about TCH and BTCH can be found in Text S3 in Supporting Information S1.

Analysis Metrics
The evaluation metrics used in this study include root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (R), coefficient of determination (R 2 ) and a modified version of Kling-Gupta Efficiency (KGE″) (Clark et al., 2021).Mathematical formulas for these metrics are detailed in Text S4 in Supporting Information S1.Additionally, we conducted permutation-based importance analysis to discern the impacts of various parameters on our model.This procedure involves selecting a feature, randomly reordering its values within the data set, recalculating predictions, and evaluating the difference between the new and previous results.Substantial differences indicate higher importance scores for the feature, while minor differences indicate lower importance.

Accuracy of Daily ET and SSM Data Set
We generated high-quality data sets of ET and SSM during 2003-2013 using multiple remote sensing data sets, ML techniques, and process models.The spatial distribution and accuracy of these data sets are depicted in Figure 4.Both ET and SSM data sets exhibit a consistent geographical pattern, with values gradually decreasing from southeast to northwest.Furthermore, these data sets provide extensive spatial coverage, particularly in the sparsely vegetated northwest inland regions, surpassing that of existing mainstream products like MODIS ET and ESA CCI SSM (K.Liu et al., 2023;Xu et al., 2019).
Figure 4b illustrates that the R values of our daily ET and SSM data sets are 0.537 and 0.664, respectively.Specifically, the daily ET product perform better in semi-humid and humid regions compared to semi-arid and arid regions.It effectively prevents overestimation in arid regions due to reduced sensitivity to meteorological inputs and utilization of surface thermal characteristics (Laipelt et al., 2021;K. Liu et al., 2022).
Compared to ESA CCI SSM, our reconstructed product not only demonstrates enhanced accuracy but also offers better spatial resolution and coverage (K.Liu et al., 2023).Verification results show that the arid region achieves the highest accuracy, characterized by maximum R and minimum RMSE and MAE, followed by semi-humid and humid regions, with the semi-arid region ranking last.Overall, considering the inherent spatial heterogeneity of field measurements, the accuracy of these two daily products remains at an acceptable level.

Water Resources Research
10.1029/2023WR035234 BO ET AL.

Evaluation of IWU at Field Stations
Verifying the proposed method at field stations before scaling it up is essential since field measurements provide actual data.As shown in Figure 5a, among the five selected explanatory factors, ET exhibits the highest importance across all five ML models, emphasizing its critical role in the hydrological cycle (X.Wu et al., 2015).The importance of the other explanatory factors varies among the models.For example, while precipitation holds the second-highest importance in the SVM model, it is less crucial in the MLR model.Such variations are expected due to the distinct structures of ML models, leading them to select features using different strategies (Meng et al., 2022).
Figure 5b exhibits the specific estimation precision of the five ML models and the BTCH integration models.Multi-year metrics reveal that BTCH shows the most satisfactory performance, with the highest R 2 and KGE″, as well as the lowest RMSE and MAE among all models.Among the remaining models, RF, SVM, and ANN all exhibit acceptable accuracy, while XGB and MLR show relatively lower performances.Collectively, these results highlight the feasibility of estimating IWU using the proposed method, especially considering the modeling challenges posed by the spatial heterogeneity of field stations.

Evaluation of IWU at the City Level
Figure 6 displays the importance of explanatory factors at the city level by extending the proposed method to the national scale.Similar to the models at field stations, the emphasis on various explanatory factors varies inconsistently among different ML models.The importance of lnSSM and lnRZSM at the national scale has increased compared to field stations, while the significance of ET shows the opposite trend, particularly in the MLR model.This shift may be attributed to the broader spatial impact of soil water storage dynamics.Regarding different crops, the distribution of permutation-based importance remains consistent between rice, wheat, and maize within the same model.For example, in the XGB model, all three cereal crops assign the highest importance to lnSSM (0.4, 0.39, and 0.39, respectively), while paying less attention to lnRZSM (0.09, 0.12, and 0.1, respectively).
Figure 7 displays the spatial distribution of predicted IWU at the city level.Geographically, the spatial patterns of estimates from each model remain largely consistent on a national scale.High IWU estimates for rice are clustered in the Lower-and-Middle reaches of the Yangtze River and the eastern coastal areas, while high IWU estimates for wheat and maize are concentrated in the North China Plain.However, the estimates of different models still show some discrepancies at a local level.Figure 8 compares the predicted IWU from each model with the reported IWU for the three cereal crops.Rice exhibits the highest estimation accuracy, with R 2 ranging from 0.78 to 0.83 across the six models (average of 2003-2013), followed by wheat (0.68-0.76) and maize (0.53-0.64).Consistently, the KGE″ also indicates decreasing accuracy in the order of rice, wheat, and maize.Despite rice's higher RMSE and MAE compared to the other crops, its estimation is considered the most reliable due to the higher irrigation needs for rice cultivation (Yin et al., 2022).Among the models, the BTCH method consistently yields the most robust results, with metrics ranking highest among the six models and exhibiting stable interannual errors.For instance, in rice, BTCH achieves the lowest RMSE, the second lowest MAE, the highest R 2 and a moderate KGE″.While the RF model also performs well in rice, it shows larger interannual errors than BTCH.These results highlight the excellent integration capability of the BTCH method.However, the performance of other ML models varies across different crops.The MLR model performs best in maize but ranks second to BTCH in rice and wheat.The SVM model performs well in rice but poorly in maize.Combining Figure 7, these results manifest the feasibility of using ML method for large-scale IWU estimation.Our study also shows the feasibility of using ML model varies with the environmental context.Nevertheless, the BTCH method effectively integrate these models to achieve more robust estimations (Shangguan et al., 2023).Additionally, we compared BTCH with an equal weighting approach (Table S2 in Supporting Information S1), affirming that BTCH delivers superior performance.The weights assigned by BTCH to various models (Table S3 in Supporting Information S1) tend to favor more accurate models, reducing systematic errors and resulting in higher weights.However, random errors also affect simulation results, leading to weights that may not perfectly correlate with model accuracy.
Figure 9 illustrates model accuracy across various climatic regions, revealing differences in reported IWU averages for the three cereal crops based on aridity levels.Rice exhibits higher IWU requirements in humid regions compared to arid ones, resulting in elevated RMSE and MAE values for rice estimation in humid areas.However, there is no significant variance in the R 2 and KGE″ values for rice across different climatic regions, suggesting minimal impact of aridity levels on the model's performance in estimating rice IWU.Conversely, wheat and maize consistently demonstrate reduced RMSE, MAE, and average reported IWU with increasing humidity levels.Higher R 2 and KGE″ for wheat are observed in semi-arid and semi-humid regions, with similar trends observed for maize.Collectively, these IWU estimation biases result from a combination of climate-related fluctuations, crop-specific attribute, and the suitability of remote sensing products across different regions.Wheat and maize are sensitive to the local climate conditions, significantly affecting the accuracy of our estimation approach.In contrast, rice, thriving in humid conditions, maintains consistent performance across diverse climatic regions.This consistency may stem from the unique requirements of rice cultivation.Rice farmers often employ measures to preserve a humid planting environment, even in arid regions, through natural or artificial irrigation.As a result, while rice cultivation spans various climatic zones, the climate variations at the individual farm level may be less pronounced.Moreover, the quality of remote sensing products varies across climate regions, influencing IWU estimation accuracy.Our method tends to perform less effectively for wheat and maize in arid regions, largely due to the limited accuracy of ET products in such environments.Rice consistently outperforms wheat and maize, demonstrating greater drought resistance compared to wheat.Wheat performs well in semi-arid and semi-humid regions, with similar performance observed for maize.Variable importance analysis across different climate scenarios further supports these conclusions (Figure S5 in Supporting Information S1).
We analyzed the total national IWU using both reported data and predicted data (after BTCH aggregation) from 2003 to 2013, as shown in Figure S6 in Supporting Information S1.Among the three cereal crops, the total reported IWU for rice and maize demonstrates a noticeable upward trend, with only a few years showing decreased IWU.Conversely, the total reported IWU for wheat displays a fluctuating pattern with a slight overall upward trend, though statistically insignificant.The predicted results mirror these overall trends, indicating the capability of the proposed framework to capture the interannual variability of IWU.

Model Uncertainty Analysis
Both ET and precipitation are crucial factors in estimating IWU (K.Liu et al., 2019).To assess the impact of different products on model performance, we replaced the original input parameters at each time step with various alternatives, including MODIS ET, PML ET, ERA precipitation, and TRMM precipitation.The spatial patterns of these products can be found in Figures S7 and S8 in Supporting Information S1.Additionally, the spatial patterns Figure 10 illustrates the effects of substituting different products on model performance, revealing noticeable changes, particularly across different crops.When replacing the original ET input (SEBAL ET) with MODIS ET or PML ET, a decline in model performance is observed for rice and maize.Conversely, both ET products decrease RMSE while increasing R 2 in wheat.However, the mean relative change is negligible, with values below 3% for RMSE and less than 1% for R 2 and KGE″, indicating that substituting the ET products may not substantially affect IWU estimation in wheat.
Comparing the original precipitation products (i.e., GPM) to TRMM precipitation, TRMM appears more suitable for estimating IWU in maize.This is evidenced by decreased RMSE and MAE, as well as increased R 2 and KGE″.Both precipitation products show similar relative changes in rice, but deteriorated metrics suggest that the GPM precipitation product is more suitable for rice IWU estimation compared to ERA and TRMM.However, replacing the precipitation product does not obviously improve or worsen the model's performance in the case of wheat, as the relative changes between the two precipitation products are ambiguous.In general, our initial ET product proves more conducive to model performance compared to MODIS ET and PML ET.While the uncertainty in precipitation products affects model accuracy, none of the individual products demonstrate a clear advantage over the others.Moreover, substituting ET and precipitation products does not fundamentally alter model performance (mean relative change <5%), highlighting the robustness of our proposed method.We further examined the impact of these products on model performance across different climatic regions (Figures S10 and S11 in Supporting Information S1).Changes in model performance remain within a stable range (<10%).However, there are specific scenarios with significant fluctuations that require attention, such as the choice of ET products when modeling wheat IWU in humid regions and the selection of precipitation products when modeling maize IWU in arid regions.
Our study also reveals an interesting aspect of ML model performance: an asymmetry in its predictive capabilities.Specifically, the ML model excels in accurately capturing intermediate IWU values but tends to underestimate high values within the IWU range.This observation suggests that while the model effectively captures the dominant factors contributing to IWU, it may overlook subtle signals crucial for representing extreme IWU values, particularly those that are exceptionally high.There is no discernible pattern in the residuals of different ML models (Figures S12 and S13 in Supporting Information S1).However, all models exhibit a common pattern where overestimation or underestimation is most pronounced in the high reference IWU range with sparse samples.Such asymmetrical performance appears to be a common characteristic shared among various ML models.This phenomenon can be attributed, at least in part, to the limited availability of data pertaining to extreme regimes.The scarcity of data points in the high IWU range may hinder the ML model's ability to adequately learn and generalize patterns associated with extreme values.
The number of samples may also affect the performance of ML models and Bayesian methods.To clarify this issue, we varied the numbers of samples used to train ML model or generate BTCH weights.The results (Tables S4 and S5 in Supporting Information S1) indicate that the number of samples (IWU simulations provided by different ML models) does not significantly affect the integration capability of the BTCH method.However, overall model performance tends to decrease as the number of samples (reference IWU and related factors) decreases.This decrease in performance may stem from the challenge posed by small sample sizes, which can hinder the ML model from capturing complex patterns in the data, leading to inadequate fitting (K.Chen et al., 2020).

Discussions
This study introduced two processes to enhance the detection of irrigation-related information.First, we addressed the potential nonlinear relationship between SM and IWU by transforming the original remote sensing SM product into its logarithmic form.Second, we developed a robust parameter called ResSM, intended to capture differences between satellite-based and reanalysis-based SM products in detecting irrigation events, particularly during rain-free periods.A similar study is discussed in S. V. Kumar et al. (2015), which validated the capability of remote sensing data sets to identify irrigation signals through a Kolmogorov-Smirnov test, comparing remote sensing SM with model-generated SM.While traditional approaches typically employed the extracted information as a parameter in water balance equations (Zohaib & Choi, 2020), our strategy involved incorporating ResSM as a feature to drive the ML model.
To assess the effectiveness of these theory-guided mechanisms, we compared the model's performance under different scenarios: one where the original SM was used instead of the natural logarithmic-transformed SM, and another where ResSM was excluded from the model.These scenarios were compared to the initial model that incorporates both mechanisms.As demonstrated in Figure 11, omitting these two physics-based processes generally results in increased RMSE and MAE for all three cereal crops, accompanied by decreased R 2 and KGE″.This change is particularly pronounced in the case of maize but less conspicuous in rice and wheat.The discrepancy can be attributed to rice and wheat already achieving relatively satisfactory metrics compared to maize, making significant improvements more challenging.Our results indicate that these theory-guided strategies enhance the model's performance.Converting the original SM into its logarithmic form led to a 1% and 4% improvement in KGE″ for rice and maize, respectively.However, for wheat, there was a 0.1% decrease in KGE″, primarily observed in arid regions, suggesting that the logarithmic transformation may be less suitable for such areas.Additionally, including ResSM resulted in a 0.4%, 0.7%, and 2% increase in KGE″ for rice, wheat, and maize, respectively.It's worth noting that we employed a simple linear model to correct the systematic deviation between satellite and reanalysis products.Future research could explore advanced correction methods and utilize typical reanalysis data sets such as the China Meteorological Administration Land Data Assimilation System (CLDAS) to enhance the accuracy of ResSM data extraction.The reliability of our IWU estimation is substantiated by comparing our results with those of two earlier studies (Y.Chen et al., 2019;K. Zhang et al., 2022) (Figure S14 in Supporting Information S1).This comparison reveals a general consistency between our study and prior research.Notably, regions characterized by high IWU, such as Xinjiang, Henan, Shandong, and Jiangsu, consistently emerge as areas with high irrigation intensity.
Previous studies (C.Zhang & Long, 2021;K. Zhang et al., 2022;Zohaib & Choi, 2020) focusing on IWU estimation often treated all crops as a single category due to challenges in discerning irrigation signals among different crops based on hydrological mechanisms.The irrigation area data used in these studies might not adequately distinguish between various types of croplands.To overcome these limitations, our research focuses on estimating IWU for three staple cereal crops in China, leveraging phenological information to extract parameters with higher precision.However, while ChinaCropPhen1km data set excels in monitoring phenological event, its capacity to differentiate the spatial distribution of crops is somewhat limited (Luo et al., 2020).To gauge Water Resources Research 10.1029/2023WR035234 the potential uncertainty associated with the ChinaCropPhen1km data set, we analyzed the cultivated area in each city as determined by the data set and compared it to the reported irrigated area provided by the Chinese Ministry of Water Resources.Figure 12 illustrates the results, revealing a strong linear relationship between the cultivated area calculated by ChinaCropPhen1km and the reported irrigated area, with a r of 0.82, 0.88, and 0.77 for rice, wheat, and maize, respectively.However, the cultivated area calculated by ChinaCropPhen1km generally exceeded the reported irrigated area, possibly due to the inclusion of rain-fed cropland in the data set.Since ML methods often employ standardized preprocessing of parameters, the impact of this overestimation of irrigated  area may be mitigated.These findings suggest that the ChinaCropPhen1km data set performs reasonably well overall and enables crop-specific IWU estimation.
Our study aims to estimate IWU using ML methods, emphasizing its viability across different crops and on a large scale.Unlike traditional hydrological approaches reliant on complex physical relationships, ML methods offer improved robustness and transferability by avoiding the need for such elusive connections.Among the ML methods we investigated, the BTCH integration method emerges as particularly promising, demonstrating its potential within model ensembles.In addition to its integration capabilities, the BTCH method excels in harmonizing data from multiple remote sensing products, thereby reducing uncertainty (J.Liu et al., 2021;Shangguan et al., 2023).Given the uncertainty associated with multisource precipitation and ET products in our study, employing the BTCH method to synthesize these data sources could further enhance the accuracy of our model.
Due to the limitations of the available reported IWU data at the city level, seamlessly extending our proposed method to the pixel level may pose challenges.Future efforts will concentrate on bridging this gap by integrating ML models with traditional hydrological methods, aiming to extend ML approaches to a finer pixel level.Additionally, with the rapid advancements in deep learning networks, exploring the application of more sophisticated models like the Long Short-Term Memory model could prove valuable, given its promising performance in simulating time series data (Cao et al., 2021;Guo et al., 2016).

Summary and Conclusions
This study presents a hybrid framework for estimating the IWU of three staple cereal crops in China.Using ML techniques, we generated a high-quality time series data set for daily ET and SSM.Subsequently, we transformed the original SM into its logarithmic terms and extracted a novel irrigation-related parameter based on differences in detecting irrigation events between remote sensing SM products and reanalysis SM products.By coupling theory-guided strategies with ML techniques, the proposed framework accurately estimated IWU at both stationspecific and national scales, with the BTCH method proving to be the most robust among the six models explored, demonstrating exceptional ensemble ability.Incorporating two physics-based processes effectively highlighted implicit irrigation information within SM data, enhancing overall model performance.These findings underscore the potential of the proposed framework for estimating IWU for staple cereal crops in China.
The performance of our hybrid framework varied with regional dryness levels.We observed higher accuracy in estimating IWU for wheat and maize in semi-arid and semi-humid regions, while rice IWU estimation remained insensitive to humidity variations.In other words, wheat and maize showed greater sensitivity to dry conditions compared to rice, posing challenges in applying the framework in northern arid regions where wheat and maize dominate crop production.Additionally, we found that model accuracy was impacted by uncertainty in ET and precipitation products.Our daily SEBAL ET product demonstrated better applicability compared to MODIS and PML ET products.However, the effect of multisource precipitation products varied across different crops.These uncertainties did not compromise the model's accuracy fundamentally, with the mean relative change consistently less than 5%.
Overall, our hybrid framework for estimating IWU holds promise in addressing food security and promoting socio-economic stability, especially amidst mounting population growth and increasing food demand.Given prevailing water scarcity issues in China, such as groundwater depletion in the North China Plain due to prolonged overexploitation (Bai et al., 2022), our study offers valuable insights for effective water resource management.Furthermore, the proposed strategies hold potential for broader application beyond China's borders, with future endeavors focusing on enhancing the framework and exploring its applicability on a global scale.

•
Hybrid framework is developed to estimate irrigation water use (IWU) for three staple cereal crops in China • Machine learning is employed to drive multiple remote sensing-derived products for precise IWU estimation • Proposed framework accurately estimates IWU and incorporates theory-guided module to reveal implicit irrigation signal Supporting Information: Supporting Information may be found in the online version of this article.
a range of ML methods, augmented by the integration of Bayesian mechanism.The main objectives of our study are threefold: (a) to generate high resolution time series of ET and SSM using remote sensing data sets; (b) to assess the effectiveness of theory-guided data-driven models in estimating IWU across different crops in China; and (c) to evaluate the uncertainty associated with the proposed model.

Figure 1 .
Figure 1.Study region.(a) Distribution of 12 stations along with the cultivation regions of three cereal crops.The dark blue region represents areas where maize and wheat rotation are prevalent.(b) Proportion of irrigation water use for the three cereal crops in each province.(c) Distribution of the Aridity index across the study region.

Figure 2 .
Figure 2. Flowchart of the proposed framework.

Figure 3 .
Figure 3. Relationship between irrigation water use and the natural logarithm form of (a) surface soil moisture and (b) root zone soil moisture at field stations.The blue line with shading represents the regression line and prediction interval, while the cyan shading indicates the confidence interval.

Figure 4 .
Figure 4. Evaluation of daily evapotranspiration (ET) and surface soil moisture (SSM) data sets.(a) Spatial distribution of average ET and SSM products during 2003-2013.(b) Accuracy metrics evaluated with field observations.

Figure 5 .
Figure 5. Verification of model applicability at field stations.(a) Permutation-based importance of explanatory factors.The error line represents the standard deviation of 2008-2014.(b) Scatter plots of reported irrigation water use (IWU) and predicted IWU in each year.The number after the "±" represents the standard deviation of 2008-2014.

Figure 6 .
Figure 6.Permutation-based importance of explanatory factors at the city level.The error line represents the standard deviation of 2003-2013.

Figure 7 .
Figure 7. Spatial patterns of average irrigation water use estimates in 2003-2013 using different models.(a) Rice, (b) wheat, and (c) maize.

Figure 8 .
Figure 8. Metrics of six models at the city level.The box-plots represent the interquartile range (box), mean and median (diamond and line within the box) of 2003-2013.

Figure 9 .
Figure 9. Metrics of Bayesian three-cornered hat with respect to dryness level.The bar and error line respectively represent the average and standard deviation of 2003-2013.

Figure 10 .
Figure 10.Changes in model performance (Bayesian three-cornered hat) due to different evapotranspiration (ET) or precipitation products, compared to using SEBAL ET and Global Precipitation Measurement precipitation products.The box-plots represent the interquartile range (box), mean and median (diamond and line within the box) of 2003-2013.

Figure 11 .
Figure 11.Mean relative change in Bayesian three-cornered hat metrics when excluding the soil moisture residuals feature or replacing lnSM with original soil moisture, in comparison to the initial model incorporating physical mechanisms.

Figure 12 .
Figure 12.Comparison between reported irrigated area and cultivated area calculated from ChinaCropPhen1km, with the black line representing the linear regression fitting.

Table 1
List of Available Data Sets Used in This Study

Table 2
Description of the Cropland Field Stations