Uncovering Current and Future Variations of Irrigation Water Use Across China Using Machine Learning

Accurately characterizing changes in irrigation water use (IWU) is crucial for formulating optimal water resource allocation policies, particularly in the context of climate change. However, existing IWU estimation methods suffer from uncertainties due to limited data availability and model constraints, restricting their applicability on a national scale and under future climate change scenarios. We present a robust framework leveraging machine learning and multiple data sets to estimate IWU across China. Forced with an ensemble of climate and socio‐economic projections, we appraise future trends and additional costs of IWU. Our model shows high accuracy in reproducing IWU, with coefficient of determination (R2) ranging from 0.86 to 0.91 and root mean square error from 0.261 to 0.361 km3/yr when compared to reported values in Chinese prefectures. Independent validation at 11 cropland sites further confirms the model's predictive power (R2 = 0.67). Under different emissions scenarios, China's IWU is projected to increase by 8.5%–17.1% (6.8%–34.8%) by 2050s (2100s) compared to the historical period (1981–2010), with higher emissions leading to more significant increases. This rise in IWU by 2050s (2100s) comes with an estimated additional cost of US $1.65–3.91 ($2.28–6.5) billion/year, highlighting the urgency for sustainable water management. Our study provides an effective approach for estimating current and future IWU using machine learning techniques, transferable to other countries facing increasing irrigation demands.


Introduction
Globally, ∼70% of freshwater withdrawals and ∼90% of anthropogenic freshwater consumptions are used for irrigated agriculture, which only accounts for ∼20% of arable land but provides more than 40% of food production (FAO, 2018;Siebert et al., 2010).Notably, China boasts the largest irritated area (∼34.27 billion hectares) worldwide, with irrigated farms covering more than 50% of the nation's farmland, nourishing around 75% and 90% of its food and economic crops, respectively, and accounting for 63% of country's total water usage (Portmann et al., 2010;Zhu et al., 2013).Irrigation water use (IWU) plays a critical role for crop growth and yield, especially in regions facing water scarcity, where recurrent droughts and extreme temperatures can exacerbate its impacts.Generally, IWU is specifically designed to represent the actual water amount entering the field, encompassing both surface irrigation water and groundwater irrigation water.Notably, the over-exploitation of groundwater induced by increased IWU pose a common challenge in critical zones with inadequate surface water resources, such as northern China, northern India, and central America (Calzadilla et al., 2010;Wada et al., 2013).
The World Bank estimated that the global cost for IWU adaption were approximately US $30%20billion in 2017, ant it would be $150%20billion by 2030s (Group, 2021).Hence, reliable characterization of IWU assumes paramount importance, not only for optimizing cropping systems, irrigation planning, and water allocation to mitigate potential reductions in crop yields but also for facilitating the expansion of feasible cropland areas and supporting multiple cropping cycles (Rosa, 2022;Zhu, Burney, et al., 2022).Moreover, with global climate change threatening agricultural productivity and food security (Jägermeyr et al., 2021;Wheeler and von Braun, 2013), the expansion of irrigation to underperforming rainfed croplands has been proposed as a potential solution to meet future global food demand (Rosa et al., 2020;X. Wang et al., 2021).However, precise future IWU estimates are limited by model imperfections, thereby impeding policymakers' ability to grasp the challenges and opportunities of sustainable water resources management under climate and socioeconomic change (Elliott et al., 2014;Flörke et al., 2018).
IWU can be obtained in a number of ways.Traditionally, it can be collected from a standard survey of farmland's irrigation practices or from water consumption statistics provided by irrigation districts.Nevertheless, the limited availability of in situ data, often due to its private nature, has restricted its broader applicability and utilization in scientific studies.To complement in situ data, hydrological and agricultural models have incorporated irrigation modules, such as PCRaster Global Water Balance (PCR-GLOBWB) (Wada et al., 2014) and Global Crop Water Model (GCWM) (Siebert & Döll, 2010).These models have offered valuable insights into irrigation activities.However, inconsistencies in the input data used in these models, such as varying data sets of evapotranspiration (ET) or soil moisture (SM), have led to discrepancies in IWU estimates, affecting water and energy budgets assessment and climate dynamics analyses.Furthermore, the diversity of cropland areas, irrigation infrastructure and schemes introduce challenges in obtaining spatially explicit IWU estimates (Rosa, 2022;Zhang & Long, 2021).
Remote sensing has emerged as a promising tool for calculating IWU by employing statistical relationships with remotely sensed parameters or integrating data assimilation techniques within numerical or process models.These methodologies, based on principles such as energy balance, water balance, and water vapor diffusion methods have shown significant potential in recent research (Chen et al., 2019;Zaussinger et al., 2019;Zhang & Long, 2021;Zhang et al., 2022).A common approach to IWU estimation defines it as the residual of ET and SM dynamics minus precipitation.While this method is relatively straightforward, it has limitations in characterizing the dynamic process that evolve from each stage of growth and is more suitable for areas with high vegetation coverage (Foster et al., 2020;Zohaib & Choi, 2020).Researchers have explored the potential of incorporating additional climatic and hydrological information, such as air temperature and snowpack, to capture IWU variations in greater detail and achieve improved accuracy (Y.Liu et al., 2022;Zhang et al., 2021).However, the scarcity of long-term records and constraints in existing models have hindered more comprehensive IWU estimation efforts.
A renaissance in machine learning (ML) technology has opened new possibilities for capturing nonlinear relationships between predictors and explanatory factors, surpassing the capabilities of conventional regression models (Reichstein et al., 2019).At this level, ML approaches are beginning to be considered (e.g., in the form of solely ML models or hybrid models that couple ML with process-based models) in estimating hydrological and biogeochemical parameters in agricultural ecosystems including nitrate concentrations (Knoll et al., 2020;Sadayappan et al., 2022), soil moisture (Ahmad et al., 2010), and streams (R. Wang et al., 2021).The integrating of ML into computational modeling holds great promise, as it enables the extraction of complex interactions and facilitates spatial generalization of process-based model outputs (Reichstein et al., 2019).Notably, ML has the potential to enhance the estimation of IWU by considering irrigated activities and crop growth patterns, since it can model intricate relationships among multiple variables, and leverage large data sets of environmental information.One of the primary challenges in utilizing ML for IWU estimation is the demand for substantial amounts of data.The success of ML models relies on the availability and reliability of input data, as well as the appropriateness of the model structure, all of which significantly influence model performance (Al-Jarrah et al., 2015).
This study aims to fill the aforementioned knowledge gaps by developing a robust ML-based approach, which utilizes a combination of satellite remote sensing, meteorological drivers, economic statistics, and numerical model outputs to estimate IWU at a national level with a spatial resolution of 0.25°.To evaluate the efficacy of the model, the study examines 339 prefectures in China during the period 2011-2013, and the results are further Earth's Future 10.1029/2023EF003562 LIU ET AL. validated against independent observations from 11 cropland sites.We extend the application of the wellestablished ML-based model to project future IWU until the year 2100, under a range of climate and socioeconomic scenarios.The outcomes of these projections provide the variations, trends, and associated costs of IWU across China in the coming decades.Our findings provide accurate IWU estimation for China and contribute to globally consistent IWU estimates.Additionally, they inform policy and decision-making, guiding sustainable water resources management amidst climate change and socioeconomic dynamics.

Materials and Methods
The China mainland (Figure 1a) comprises 31 provinces.Precipitation in China is unevenly distributed, and dry regions are primarily located in the western and northern regions (Figure S1 in Supporting Information S1).The East Asian and South Asian monsoon occurring from April to October brings a pronounced precipitation increase.China has a dominant economy in agriculture, with an emphasis on irrigation.The principal crops in the country include rice, wheat, corn, and soybeans.The country's diverse agricultural practices and vast geographical and climatic variations result in a considerable variety of crop types and crop rotations.The agricultural landscape in China can be categorized into main agro-ecological regions, including the northern grain belt, the southern rice- growing region, and the western arid and semi-arid areas.Each region possesses unique crop types and cropping systems adapted to local conditions and requirements (Text S1 in Supporting Information S1).
With the human population increasing and climatic warming, irrigation water use has been increasing in China, and the rising agricultural water scarcity has been adversely affecting food security (Figure 1b and Figure S2 in Supporting Information S1) (Qi et al., 2022).Groundwater is a crucial water resource for agricultural irrigation in the water supply-scarce regions in China, but groundwater depletion has been occurring due to extensive human activities, particularly in the Northern Plain and Haihe Plain (Feng et al., 2013;Liu et al., 2021).
To accurately calculate the national IWU and investigate the variations and trends of IWU by the end of this century, we conducted the following three procedures (Figure 1c): (a) examining the feasibility of IWU calculation model based on field measurements and satellite observations (e.g., precipitation, ET and soil moisture) and IWU reported data set; (b) developing and evaluating the ML-based models; (c) extending the proposed model to future simulations and assessing the impacts of climate change and socioeconomic development on IWU.

Materials
A variety of data sets were used during the study period, including gridded precipitation (P), ET and SM, meteorological data (i.e., maximum air temperature (T max ), minimum air temperature (T min ), and shortwave radiation (SRAD), snow water equivalent (SWE), phenology data, irrigated area fraction, and county-level irrigation area).All these data except irrigated area fraction and irrigation area were unified to a spatial resolution of 0.25°.We also obtained the annual reported prefecture-level IWU along with two previously generated IWU products to evaluate our work's accuracy.Moreover, a set of field measurements at 11 crop sites across China was collected to confirm the proposed model.In addition, four earth system model-based simulations under three emission scenarios were used to extend the proposed model in the future.Details of the listed data set can be found in Tables S1 and S2 in Supporting Information S1.

Precipitation
We used the precipitation data set during 1981-2013 from ERA5 (the new fifth-generation atmospheric reanalysis of ECMWF) for model establishment and evaluation.These reanalysis products can provide detailed information about the vertical atmospheric field derived from continuously optimized analysis systems and data assimilation activities (Kidd & Levizzani, 2011;Xie et al., 2018).As the latest reanalysis product released by Copernicus Climate Change Service, ERA5 has a more advanced assimilation system and higher spatial-temporal resolution (Jiang et al., 2021).Although ERA5 grid precipitation has been widely utilized, we evaluated its applicability at the prefecture level based on 836 field precipitation measurements from the China Meteorological Administration and observed acceptable accuracy (Figure S3 in Supporting Information S1).
Considering the uncertainty in the precipitation data set, two precipitation data sets from the Climatic Research Unit (CRU) and the Global Precipitation Climatology Centre (GPCC) were obtained for uncertainty analysis by comparing them with those from the ERA5 product.CRU data set, produced by the University of East Anglia, provides a number of variables, including monthly precipitation at 0.5 × 0.5°resolution since January 1901 (Asoka et al., 2017).The GPCC monthly precipitation product is derived from radiometric observations collected from satellites and rain gauges since 1951 (about 85,000 stations worldwide) with a spatial resolution of 0.25 × 0.25° (Ajaaj et al., 2015).

Evapotranspiration
ET data set during 1981-2013 was derived from GLEAM (Global Land-surface Evaporation Amsterdam Methodology), which provides global daily ET estimates at 0.25°spatial resolution.Driven primarily by satellite data sets, GLEAM takes soil moisture as the constraint for simulating evaporation (Yang et al., 2017).Three other ET data sets from NTSG (Numerical Terradynamic Simulation Group) and TerraClimate and SYNET were used for further uncertainty analysis.The NTSG data set is primarily based on a modified Penman-Monteith approach that integrates parameters from eddy covariance flux towers and satellite observation (Zhang et al., 2010).The TerraClimate ET is derived from a modified Thornthwaite-Mather climatic water-balance model.Its input parameters include precipitation, reference ET, and soil water capacity obtained from satellite observations (Zhao et al., 2019).SYNET data set is a monthly ensemble synthesized from 12 ET products, including MODIS and Earth's Future 10.1029/2023EF003562 LIU ET AL.NTSG, by using a global high-quality flux eddy covariance, and the performance of this product over China has been proven to be superior to local ET products (Elnashar et al., 2021).

Soil Moisture
We mainly used the surface soil moisture (SSM) from the European Space Agency Climate Change Initiative (ESA CCI) and the root zone soil moisture (RZSM) from ERA5 for model establishment.As part of the Climate Change Initiative, ESA CCI SSM data set is based on microwave satellite observations, which convert the observed land surface brightness temperature to soil water content (Kovačević et al., 2020).Because the penetration capability of current microwave sensors is not enough to obtain accurate soil moisture in the plant root zone, we have to introduce the ERA5 RZSM product, which incorporates an advanced land data assimilation system to make up for the lack of satellite observation (Reder & Rianna, 2021).
Notably, although the ESA CCI SSM data set has comprehensive spatial coverage, some regions still have gaps.We used one machine learning approach to gap-fill this data set based on satellite observations, reanalysis model outputs, and climatic information.More details can be found in our previous study (Liu et al., 2023) and Text S2 and Figure S4 in Supporting Information S1.
We also used two SSM data sets from ERA5 and GLEAM for uncertainty analysis.Belonging to the same products as ERA RZSM, ERA SSM reflects the soil water content of the first layer (∼7 cm) (Fan et al., 2022).GLEAM estimates land evaporation and corresponding components globally through remote sensing observations as a process-based semi-empirical model.Currently, the global soil moisture data set based on GLEAM has been widely used (Martens et al., 2018).

Other Meteorological and Hydrometeorological Data
We obtained a variety of meteorological variables, including maximum air temperature, minimum air temperature, and short-wave radiation, from the ERA5 data set.
We used satellite-derived SWE during 1981-2013, which is generated based on satellite products and machine learning.For this SWE data set, a large number of in situ observations and five snow depth data sets are used to train and apply the machine learning algorithm, including Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR snow depth from the Northern Hemisphere snow depth (NHSD)), ERA-Interim and Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) (Hu et al., 2021).

Phenology Products
The land surface phenology metrics used in our study were obtained from the Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Dynamics (MCD12Q2) Version 6 products.These metrics are derived from the time series MODIS Nadir Bidirectional Reflectance Distribution Function-Adjusted Reflectance (Ganguly et al., 2010).These metrics enabled us to identify critical periods in the cropland growth cycle and irrigation activities, offering valuable insights into vegetation growth timing, irrigation duration, and crop-water dynamics.

Irrigation Area, Irrigation Area Percentage, and Irrigation Water Use
We used annual prefecture-level IWU data from the Chinese Ministry of Water Resources report.It crucial to note that this data set predominantly captures irrigation water withdrawal, which represents the total water diverted from a variety of water sources.Despite being recorded at the scale of the entire irrigated area or administrative division, this data set remains the most reliable sources for evaluating IWU calculation across China (Qi et al., 2022;Zhou et al., 2020).Although the reported data primarily reflects irrigation water withdrawal rather than the exact IWU, this discrepancy does not impact the performance and evaluation of our model.This is because the consistent data set is used for both the establishment and validation of the model.Additional clarification on the meanings of different irrigation-related definitions can be found in Text S3 in Supporting Information S1.
We obtained the grid fraction of irrigation area from the Global Map of Irrigated Area (GMIA) product over 2000.GMIA is mainly derived from the statistical reports of the Food and Agriculture Organization of the United Earth's Future Nations, the land use data set from the United Nations Ministries of Agriculture, and the land cover data set from the United States Geological Survey (Meier et al., 2018).To calculate the ET for different future scenarios, we also obtained the land cover fractional based on the combination of ESA GlobCover 2009 maps and GMIA irrigation area fraction.
Two previously generated IWU products were also used for evaluating the accuracy of our work.One is the IWU data set from Zhang et al. (2022), another is from Chen et al. (2019).These two data sets are based on satellite data sets and process-based models.

Field Measurements
A set of field measurements for IWU at 11 crop field stations across China was used during 2008-2014 to evaluate the proposed model.Details of the selected crop sites can be found in Table 1.The SSM and RZSM at these stations were collected from field measures, and the meteorological variables (i.e., precipitation, maximum air temperature, minimum air temperature, short wave radiation) were derived from the meteorology stations nearby.
The ET at each station was calculated using equations based on (Teo et al., 2022;Zhang et al., 2001), simplified as follows: where P is precipitation; w is the available water coefficient; PET is potential ET that was calculated using the FAO Penman-Monteith method: where ∆ represents the slope of the saturation vapor pressure curve versus temperature; T is the mean air temperature; e s and e a are the saturation and actual atmospheric water vapor, respectively; R n and G are the surface net radiation and the all-wave ground heat flux, respectively; γ is the psychrometric constant; and u 2 is the surface wind speed.

Future Scenario Simulations
We used various simulations from earth system models within the Phase 6 of the Coupled Model Intercomparison Project (CMIP6), which can provide a variety of future scenarios based on different assumptions regarding climate mitigation policies, socio-economic development, and global governance (Cook et al., 2020).We used SSP1-26 (low-end forcing pathway, warming target of less than 2°C), SSP3-70 (medium to high-end forcing pathway, warming target of 4°C), and SSP5-85 (high-end forcing pathway, warming target of around 5°C) for representing emission and land-use scenarios at radiative forcings stabilized of 2.6, 7.0, and 8.5 Wm 2 , respectively.Considering possible future and model structure independence, we selected four extensively used earth system models, that is, CESM2, MIROC6, MIROC-ES2L, and TaiESM1 (Table S2 in Supporting Information S1).Four meteorological variables, that is, precipitation, maximum air temperature, minimum air temperature, short wave radiation, and three other hydrological variables, that is, SWE, SSM, and RZSM, were used.We selected the historical period from 1981 to 2013 and the future period from 2031 to 2100 to remove model perturbation.Two phases in the future period will be critically focused on: the mid century (2031-2050) and the late century (2081-2100).
Although earlier studies (Saha & Sateesh, 2022;Song et al., 2021) have used the observations or reanalysis outputs to calibrate the CMIP6 data set, we did not conduct this calibrate procedure to avoid other uncertainties.This is acceptable because our analysis is focused primarily on comparing future periods to baseline ones.

Methods
The proposed IWU calculation model is focused on efficiently establishing the intrinsic connection between IWU and explanatory variables: where V i is the corresponding explanatory vector.f is one function, and here we used one machine learning-based regression.ε represents the model residual.

Machine Learning (ML) Model
Compared to the traditional regression models that insufficiently consider the probability density functions, machine learning approaches could be more flexible in handling nonlinear relationships and complex interactions.
As an enhanced decision tree algorithm, random forest is a powerful means for interpreting earth variables.This model can train each tree on bootstrap resamples rather than relying on the whole data set (Tang et al., 2021).
During the model training stage, the bootstrap method is used to randomly sample data (Hutengs & Vohland, 2016).After multiple rounds of sampling and generating a corresponding number of decision trees, the final output result is determined by the average value of all decision tree outputs (Figure S5 in Supporting Information S1).
where Y represents the final output, k represents the number of training rounds and decision trees, X n is the selected feature, T i (X n ) is the output of decision tree i.
The RF model was implemented with the help of Python packages for our study.In particular, a Bayesian strategy based on the "Bayesian Optimization" module (Liu et al., 2023;Martinez-Cantin, 2014) was used to select the optimal hyperparameters.

Model Feasibility
IWU associates with a variety of explanatory variables relating to hydrological information (Chen et al., 2010).
Our model is based on the widely recognized water balance formula, which defines IWU as the residual between actual ET, precipitation, and changes in soil moisture (Döll & Siebert, 2002;Puy et al., 2021).
where D is the drainage that includes both deep percolation and lateral infiltration, and such variable closely related to soil moisture.
We also consider climatic variables like maximum and minimum air temperatures, and shortwave radiation to further refine our IWU calculations.Temperature has long been recognized as a critical factor affecting water consumption and drought stress risk during crop growth.With rising temperatures, the corresponding increase in irrigation water needs becomes crucial to maintain an adequate water supply for crops facing elevated water stress (Schauberger et al., 2017).Another key climatic variable, shortwave radiation, exerts significant influence on crop irrigation requirements by directly impacting photosynthesis rates and plant transpiration rates, thereby affecting the physiological processes of plants.As radiation levels fluctuate, irrigation water needs adjust accordingly, as these changes directly influence the water uptake and transpiration rates in plants.We also acknowledge the importance of snowmelt covering specific regions and its potential implications for crop irrigation water.Snowmelt alters regional water availability and surface albedo, directly influencing irrigation needs in those affected areas (Barnett et al., 2005).
We utilized essential explanatory factors, including precipitation, ET, soil moisture, irrigation area, maximum air temperatures, minimum air temperatures, shortwave radiation, and snowmelt, to drive our machine learning models.Calculating average values of these selected variables based on the irrigated area percentage, we focused on pixels with an irrigated area percentage greater than 1% to concentrate on regions significantly influencing irrigation water demand.Furthermore, our model is confined to specific phenological periods, capturing relevant variables that impact IWU during critical stages of crop growth, leading to a more accurate estimation of water requirements during key developmental phases.
The correlations between selected explanatory variables and IWU can be further reflected at the field crop scale (Figure 2a).Results indicate that all selected explanatory variables exhibit significant correlations with IWU, or correlations that approach statistical significance.Specifically, IWU negatively correlated to precipitation but positively correlated to ET.Air temperature and shortwave radiation were positively related to IWU.These correlations are also supported at the prefecture level (Figure 2b), which illustrates that the IWU is increased with increasing ET, air temperature, and shortwave radiation but decreased with increasing precipitation.
We also used the permutation-based importance score generated by RF (Mi et al., 2021) to examine the connection between IWU and the selected variables.To remove the influence of data set splitting, we ran the RF model 100 times by splitting the data set into resampled training data (80%) and testing data (20%).The importance scores produced at the prefecture level suggest that all the selected covariates are meaningful in describing IWU, as illustrated in Figure 2c.A similar pattern of high importance variables is observed at the field scale, further supporting the selected variables and the feasibility of the proposed models.

ML-Based Future Scenario Simulations
ET during the historical period and future period was calculated using Equation 6 (Teo et al., 2022;Zhang et al., 2001): where l i is the area percentage of land type i, and w i is the available water coefficient.The PET was calculated using the FAO Penman-Monteith method described in Equation 2. While such a method allows for various land types, we exclusively focus on cropland.
The calculated ET was evaluated for the historical period by comparing it against the GLEAM product during 1981-2013.Our ET is correlated with the GLEAM product with R 2 between 0.61 and 0.70 (p < 0.01) (Figure S6 in Supporting Information S1), meaning the reliable data set of calculated ET.
Since it is challenging to simulate SSM and RZSM, we used one machine learning-based calibration model (Callaghan et al., 2021;Liu et al., 2021) to estimate SSM and RZSM for scenario periods.The CMIP6 precipitation, maximum air temperature, minimum air temperature, short wave radiation, and SSM (and RZSM) are used as explanatory factors.Specifically, we first establish random forest-based regressions between ESA SSM (ERA RZSM) and explanatory factors during 1981-2013 and then apply these regression relationships to the explanatory factors in both historical and future scenarios to obtain the final SSM and RZSM simulations.
To confirm the accuracy of the simulated soil moisture, we validated our method of calculating SSM and RZSM during 1981-2013 against the ESA SSM and ERA RZSM products.Our product is highly correlated with the ESA SSM with R 2 between 0.54 and 0.64 (p < 0.01) and with the ERA RZSM with R 2 between 0.47 and 0.55 (p < 0.01) (Figures S7 and S8 in Supporting Information S1).We thus consider the machine-learning approach as robust for further IWU calculation.

Evaluation and Validation
Before model establishment and application, we checked the variations and trends of the selected explanatory factors during 1981-2013 based on satellite observations and reanalysis outputs.To better understand the variations and shifts of IWU during the past and future, we also checked the change ratio of earth system modelbased simulations during 2031-2100 relative to baseline periods.We used Sen's slope estimator to determine the grid-level trends and change ratios based on the Seasonal and Trend decomposition using Loess (STL)decomposed components that exclude the seasonal signal and the remaining components (Liu et al., 2021).
To evaluate the proposed model performance, we used the reported IWU in 2011-2013 as a reference data set to assess the estimated prefecture-level IWU while using the data set in 1981-2010 as the training data set.We also used the measurements from 11 cropland sites across China to evaluate the model performance.For this experiment, the data set in 2008-2012 was used as the training period, while 2013-2014 was used as the evaluation period.Three metrics, that is, determination of coefficient (R 2 ), root mean square error (RMSE), and mean absolute error (MAE), were used to evaluate model reliability.
To check the model sensitivity to critical explanatory factors, we used a variety of additional data sets, that is, precipitation, ET and SSM, to conduct a comparative analysis.In addition, three regression models, that is, extreme gradient boosting (XGB), support vector machine (SVM), and multiple linear regression (MLR) (Text S4 in Supporting Information S1), were used to compare with the random forest model.
To investigate the variations and trends of future IWU in the context of climate change and socioeconomic development, we simulated annual China's IWU during 2031-2100 under three future scenarios, respectively, based on the training data set in the baseline period .The model performance in scenario simulations was evaluated using references during 2011-2013.In the final, the annual IWU of each scenario were further compared with their levels during the baseline period, and the corresponding extra cost due to IWU changes were also calculated.The change ratio (CR) in IWU was obtained using Equation 7: where IWU f and IWU b are the IWU in the future and baseline periods, respectively.

Explanatory Factors Change in Historical Observations and Future Scenarios
The accuracy of IWU can be affected by factors controlling water consumption and growth activities.We first investigated the trends and shifts of critical impacting factors.As illustrated in Figure 3a, during the past three decades, the spatial-temporal pattern of climatic variables such as precipitation and air temperature has changed obviously, especially in northern and western China, which is more constrained by water availability.These variabilities in climatic factors could further modify the variations of ET and SM through the water and energy cycle (Aminzadeh & Or, 2013).Critically, an increase in air temperature, particularly occurring in northern China, could enhance crop irrigation water requirements, while the rising radiation could potentially take effect in southern China.We cannot skimpily infer that the impact of a specific variable on irrigation is purely harmful or beneficial.For example, the rise in air temperature will generally cause an increase in water consumption as described above.However, for some regions in western China (such as Xinjiang), it will also alleviate the stress of water supply by increasing the meltwater, which is a crucial irrigation water source for the locals (Li et al., 2014).
We further analyzed the temporal evolution of critical variables in the future, as shown in Figure 3b.The maximum and minimum air temperature rose by about 2.9% and 1.6% during 2031-2100 relative to the baseline during 1981-2010, respectively.It indicates that variations in climatic factors may be the positive determinants of increasing IWU in the future, which is further supported by increasing ET.Another impact of rising temperatures is the reduction of SWE, which will be reduced by about 32.7% in the coming decades.For areas where meltwater is the primary water source of irrigation, they may have to face the problem of increasing water demand and drying up irrigation water sources at the same time (Dai et al., 2012;Hong et al., 2022).Nevertheless, precipitation will increase in the future, and the ratio of rise will reach 10.8%, which may mute irrigation water requirements.Collectively, IWU will be controlled by both the positive and negative effects of climate warming, and its pattern remains unclear under future climate change and socioeconomic development.

Accuracy of Estimated IWU
We evaluated the estimated annual IWU during 2011-2013 with the reported IWU from 339 prefectures in China.
Figure 4a shows the spatial distribution of the estimated IWU, which varies from 0.01 to 6.8 km 3 /yr.The majority of China's prefectures have IWU of less than 3.0 km 3 /yr, and a high IWU exists in northern China, which has less precipitation.Figure 4b shows the scatters of the estimated IWU versus the reported IWU.A high R 2 (0.86-0.91) and low RMSE (0.261-0.361 km 3 /yr) are obtained.The promising performance of our mode can be further corroborated by comparing with two generated products in the provinces and prefectures of China (Figures S9 and S10 in Supporting Information S1).
The proposed model was further evaluated at selected 11 cropland sites mainly supplied by irrigation.The estimated IWU exhibits a relatively high correlation against the field measurements (Figure 4c, with an R 2 of Combining the evaluation from the prefecture and field scale, we believe that our IWU calculation framework could capture the dynamic changes of irrigation.However, lower accuracy may be obtained over dry regions due to the uncertainty in the available data set (especially ET and SM) and model applicability (Figure 4d).b) but for four earth system models based on historical scenarios.Liu et al., 2022;Sadayappan et al., 2022), which is supported by the fact that only 21% of prefectures have IWU above 3 km 3 /yr.
We extended the proposed model to three future scenarios by transferring its learning capacity into four earth system model simulations.Figure 4e displays the spatial patterns of the annual IWU estimated by our model at 31 provinces.Consistent spatial patterns of IWU are predicted with four earth system model simulations during 2011-2013.Figure 4f compares the IWU with the reported IWU.Despite noticeable differences among the four earth system models, the accuracy of the IWU remains at a high level, with the R 2 larger than 0.87.The TaiESM1 model has the highest accuracy (R 2 = 0.922), followed by MIROC6, MIROC-ES2L, and CESM2 (R 2 = 0.915, 0.912, and 0.876, respectively).Our work is consistent with earlier studies that used the earth system model simulations for modeling agricultural variables such as crop yield (He et al., 2022) and nitrous oxide (Schauberger et al., 2017).In summary, the estimated IWU is reliable and robust for further analysis.

Model Uncertainty Analysis
Precipitation and ET are essential in capturing soil water dynamic changes, influencing the performance of the IWU estimation model.Therefore, we replaced the ERA5 precipitation with CRU and GPCC products, as well as replacing the GLEAM ET with two widely used ET products (NTSG and SYNET) in calculating IWU.Figures 5a  and 5b demonstrate the accuracy of the available precipitation and ET data set for IWU estimation.Relatively small discrepancies (<2%) in accuracy metrics are observed among these precipitation drivers, but remarkable discrepancies are observed for ET.This is reasonable considering that ET is highly related to plant biophysical properties that control water compositions, for example, leaf area index, vegetation greening, and canopy conductance.The unsatisfactory accuracy of current ET products also contributed to such uncertainty in the drylands, such as Northern China (Liu et al., 2019;Yang et al., 2012).
The data gaps in satellite soil moisture product, generally relating to the discontinuous coverage and revisit frequency of satellite constellations, is a critical obstacle in implementing IWU estimating models.Accordingly, a machine learning-based gap-filling method was used to generate nearly complete coverage of soil moisture.
Comparison analysis (Figure 5c) shows that gap-filled ESA SSM substantially improves the IWU calculation by 40%.To confirm this, the calculated IWU based on ESA SSM was further compared with the estimations based on ERA5 and GLEAM SSM.We notice that all the derived IWU show acceptable accuracy but vary noticeably.Higher accuracy is found for ESA SSM, suggesting the superiority of the reconstructed satellite products in modeling IWU.
The proposed RF-base model was further compared against three extensively used regression models (Figure 5d).It's observed that XGB could calculate IWU with high accuracy, suggesting the stable applicability of tree models and the broad feasibility of the used variables.SVM shows lower accuracies when compared to RF, which means there is a potential uncertainty in machine leaning approaches that it cannot make accurate predictions because of its model structure (Biau & Scornet, 2016).The lowest accuracies are found for MLR, further confirming the remarkable performance of the ML model.
Additionally, we acknowledge that urban landscapes within cities or prefectures may contribute relatively less to overall crop cultivation in China.Despite this, we have considered irrigation area as a critical variable to depict land cover information.It is important to recognize that statistical irrigation area data may introduce uncertainties.
While the impacts of these uncertainties are not substantial (Figures S12 in Supporting Information S1), further investigation is warranted to ensure the robustness of our model.

Variations and Trends of Future IWU Estimates
Since the future model drivers can be available from a variety of earth system models, the proposed framework was used to calculate future IWU by assuming a constant irrigation area from 2010 onward.Figure 6a shows the temporal changes in IWU over China mainland during 2031-2100 based on the CMIP6 simulations (similar pattern in Figure S14 in Supporting Information S1).Under the SSP3-70 and SSP5-85 scenarios, China's IWU will evidently increase by 2100s, and an insignificant increasing trend is observed in the SSP1-26 scenario.
With the historical period  as a baseline, it's found in Figures 6b and 6c that under the SSP1-26 and SSP3-70 scenarios, China's IWU will increase by 8.5%-11.6%(6.8%-14.4%)and 10.5%-12.3%(16.4%-18.9%)during the middle century (late century).In the SSP5-85 scenario, IWU will increase by 11.3%-17.1% (20.9%-34.8%).It is important to emphasize that the TaiESM1 model exhibits considerable uncertainty in its performance.Despite indicating a larger overall national IWU during 2081-2100 compared to 2031-2050, this model could estimate lower IWU values for specific provinces.Collectively, IWU in China is likely to increase under future climate and socioeconomic change, and it will increase slightly in the mid-century but increase evidently in the late century.
By checking the spatial patterns of the changed IWU during 2031-2100 (Figures S16 and S16 in Supporting Information S1), we find China's IWU will be increased for ∼60% of provinces, which have been experienced scarce water use, and the reduction of IWU in the future may alleviate their water supply stress.Meanwhile, the scarce water use may be eliminated for southern China due to the decreased IWU relating to high precipitation.It should be noted that IWU in SSP1-2.6 will shift to increasing from decreasing in the SSP3-70 scenario over some provinces such as Shaanxi, Henan, and Ningxia.This shift is more evident in the SSP5-85 scenario, suggesting that more IWU will be required if future warming scenarios switch from low to high.We calculated the extra cost resulting from future IWU changes based on 1981-2010, using the average IWU price of US $0.09 per cubic meters in China (Figures 6b and 6c).Obtaining the value of water accurately is challenging due to the lack of a standardized pricing mechanism across different countries and regions.Here we employed a global mean water value for major crops as a useful starting point for our analysis (Bierkens et al., 2019;D'Odorico et al., 2020), allowing us to gain a broad understanding of the patterns and economic effects of IWU.For SSP1-26 scenario, the extra cost will be US $1.65-3.25 billion/year by 2050s and US $2.28-3.28billion/year by 2100s due to the increased IWU.For SSP3-70 scenario, an extra gain of US $2.37-3.18billion/year by 2050s and US $3.11-5.3billion/year by 2100s is obtained, higher than that for SSP1-26 scenario.Moreover, under the SSP5-85 scenario, the additional expenditure induced by IWU changes will reach US $2.37-3.91billion/year by 2050s and US $4.88-6.5 billion/year by 2100s.Collectively, more costs will be required in the future scenario that is impacted by climate change and socioeconomic development.These findings call attention to the threat for future irrigation water use and even food security.

Contributions of Remote Sensing and ML to IWU Calculation
The performance of IWU calculation is tightly related to the proper model drivers, especially the completeness of SM and the accuracy of ET (Zhang et al., 2022).On the one hand, SM plays a crucial role in estimating IWU by characterizing soil water holding capacity.Our study demonstrates that combining satellite-based SSM and reanalysis RZSM allows for reliable IWU estimates.By considering both SSM and RZSM, our model captures the combined effect of immediate and sustained water availability on crop irrigation demand, enhancing accuracy, particularly in regions with diverse soil and crop types.The two soil moisture variables complement each other in representing different aspects of water availability to crops.Nevertheless, the lack of long-term full coverage soil moisture record is a notable issue for a national scale analysis.Here a machine learning-based approach is introduced to reconstruct nearly full coverage satellite-derived SSM products, and the generated nearly full coverage of SSM improves the model accuracy by ∼40% relative to raw ESA SSM.On the other hand, ET is important in determining soil water dynamics, and is correlated with crop biophysical characteristics.Previous studies have illustrated that GLEAM ET, as a seamless satellite-derived product, can achieve good performance of estimating IWU at a regional scale (Jalilvand et al., 2019;Shah et al., 2019).Our study corroborates the applicability of the GLEAM model for national-scale IWU estimation and demonstrates that GLEAM ET is better in estimating IWU than other ET data sets.Given that GLEAM may underestimate soil evaporation since it measures only surface soil moisture (Zhou et al., 2021), the improved version of the ET data set should be recommended in further work.In addition, a set of future ET and SM data sets are well simulated using earth system models in our work, which can further drive the ML-based IWU model into future scenarios.Such a scheme for future scenario analysis can potentially be extended to other agricultural and biogeochemical measures.
Our results highlight the effectiveness of machine learning for estimating both present and future IWU.
While XGB and SVM models achieved commendable accuracy in calculating IWU, they slightly underperformed compared to the RF model.The consistent performance across machine learning approaches strengthens the credibility of our framework, affirming the relevance of selected variables to IWU dynamics.These models leverage intricate relationships among environmental factors, facilitating reliable predictions of irrigation water requirements.Discrepancies among models may arise from factors such as feature engineering, hyperparameter tuning, data distribution, etc (Zounemat-Kermani et al., 2021).The potential uncertainties stemming from model complexity, highlighting the importance of further research and model refinement.Mention that our study primarily focuses on the city or prefecture level in China, providing localized and relevant insights for policy decisions and water resource management strategies.However, our findings can also serve as references for province-scale studies.Training the IWU estimation model at the province level may lead to higher accuracy compared to the prefecture scale, benefiting from a larger training data set (i.e., the combination of each prefecture within the province) that can better capture variations and avoid extreme ranges.

Comparing With Existing Studies
Numerous models have been proposed for estimating IWU in the current literature.For instance, the SAtellite Monitoring of IRrigation (SAMIR) models and soil water balance models have been widely utilized for field-scale IWU estimation (Jalilvand et al., 2019;Simonneaux et al., 2009).However, many of these earlier methods have been primarily validated at the regional scale and rely on sufficient in situ measurements to optimize their models, thus limiting their nationwide implementation.Moreover, previous studies focusing on relatively coarse spatial resolutions at sub-catchment and watershed scales may result in underestimated IWU estimates due to a lack of constrained prior knowledge (Parsinejad et al., 2022;Zaussinger et al., 2019).In contrast, our proposed model effectively overcomes these limitations, achieving a harmonious balance between spatial scale and estimation accuracy.The national IWU estimates derived from our study exhibit reasonable accuracy, as demonstrated by R 2 values ranging from 0.86 to 0.91 and RMSE values between 0.26 and 0.361 km 3 /yr at the prefecture level.These accuracy metrics are comparable to or even superior to those reported in existing studies (Table S3 in Supporting Information S1) and two IWU data sets obtained from satellite data sets and process-based models (Figures S9 and S10 in Supporting Information S1).
Despite the growing recognition of the agricultural water stress under climate change (Guo & Shen, 2016;Zhang & Cai, 2013), its future trends and variations have yet to be discovered due to data opacity and model constraints.
While some previous studies have attempted to simulate global-or national-scale future (Elliott et al., 2014;Wada & Bierkens, 2014), the spatial resolutions of these endeavors have proven too coarse to capture intricate dynamics in specific regions.Our extension work offers a valuable tool to discern the national pattern of IWU under future scenarios.To the best of our knowledge, this research represents an initial step toward comprehending the potential impacts of climate change and effective emission controls on agricultural irrigation in China.

Implications, Caveats, and Prospects
Our work implies that the conflict between water supply and withdrawal will be exacerbated under all emissions scenarios in China, especially in northern and western provinces.In other words, farmers in China may face a heavier burden under the worst-case future scenario, which should raise awareness of the government and stakeholders.To avoid the adverse effects of water scarcity resulting from the increased IWU, a variety of measures are needed to save agricultural water use, such as optimizing crop planting structures and popularizing high-efficiency irrigation technology.Meanwhile, attention should be given to ensure water supply, which is essential due to the already uneven spatial distribution of water supply and demand across China (Jiang, 2009).
Although the southern China provinces are rich in water resources, most of the northern China provinces tend to be short of more water resources where a rise in IWU is projected.It is thus necessary to accelerate the construction of water conservancy projects in northern and western China, entirely using water resources, such as sewage and rainwater, and reducing the overexploitation of groundwater.In light of our work's merits, it is essential to acknowledge certain caveats that warrant careful consideration.First, the spatial resolution of our available data set and simulated products may be too coarse to effectively capture fine-scale variations in ET and soil moisture.Due to such limitation, the consideration of crop type and irrigation equipment was constrained in our model.To address this, future research could incorporate high-resolution satellite observations and numerical model simulations for more detailed estimations.Second, within the proposed machine learning-based framework, quantifying the contribution of groundwater to irrigation poses a challenge due to the limited focus on physical mechanisms.Given that groundwater serves as a critical water source for agricultural irrigation, particularly in northern China, the estimation of IWU may be prone to overestimation if groundwater supplies are not adequately accounted for.Third, our method may not fully capture the actual heterogeneity of land use changes.Factors such as stakeholders' choice of crop types, the adoption of efficient irrigation methods, and various farm management practices including mulching, introduce significant uncertainties in the simulation and prediction of IWU, especially pronounced in the arid regions of Northwest China.To improve accuracy, future advancements could involve integrating crop models that consider irrigation efficiency and water distribution in the irrigation process.Lastly, the selection of model structures and explanatory variables significantly impacts the accuracy of model simulation results.Emphasizing the rationale behind model choices, particularly in efficiently handling extreme values, is of paramount importance in future research efforts.
It should be mentioned that the unique characteristics of northern arid zones in China, including data limitations, model structure uncertainty, and the complex interplay of factors affecting IWU, make it a particularly challenging region for accurate IWU estimation using remote sensing.One notable challenge arises from the distribution of our validation sites.The high accuracy of the developed model compared to the site-scale data may be because most of the collected sites are concentrated in eastern China, and there are fewer site contrasts for the northern arid zones (e.g., Gansu, Ningxia) due to the inaccessibility of the data.This limitation underscores the necessity of addressing geographic diversity in validation data to ensure the model's generalizability across a wider range of regions.This concern will be considered in our future work, where we intend to expand our data set through enhanced field measurements and integration of model-simulated data.On the other hand, our model encountered challenges when simulating IWU in the northern arid zone.The limitations in accuracy and spatial resolution of evapotranspiration data in these drylands contributed to this issue.Additionally, it is likely that the machine learning model overlooked subtle signals that are crucial for capturing high levels of extreme IWU.Addressing these challenges further requires a combination of improved data collection, more advanced remote sensing technologies, and modeling techniques that can account for the complexities of the region.
While our study employs static irrigated areas for future scenarios, it is crucial to acknowledge that future irrigation water use is influenced by various factors.In addition to climate variables, the expansion and intensification of croplands plays a significant role in driving irrigation water demand.Regions undergoing such changes Earth's Future 10.1029/2023EF003562 LIU ET AL.
are likely to experience higher irrigation requirements than our model estimates suggest.Conversely, climate change-induced snow melting has profound implications for irrigation water use, especially in snow-reliant regions (Qin et al., 2020;Zhu, Kim, et al., 2022).As global temperatures rise, accelerated snowmelt can alter water availability, runoff patterns, and even impact groundwater resources, further affecting irrigation demands.Therefore, understanding the dynamic interplay of cropland changes and climate factors is essential for accurate projections of future irrigation water use, warranting further investigation.

Conclusions
Machine learning has emerged as a powerful tool for simulating and predicting ecohydrological variables, offering great potential for estimating IWU on a national or global scale.Our study presents a robust framework for prefecture-scale IWU calculating, employing a machine learning-based model integrated with a bootstrap strategy and a comprehensive long-term prefecture-level training data set spanning approximately 30 years.The selection of an appropriate model structure and the identification of effective model drivers are crucial steps in our approach, encompassing a diverse range of data sources.These sources include satellite observations, field measurements, and relevant model knowledge.Our study establishes the credibility of the chosen variables in accurately depicting IWU dynamics.This substantiation is evident at both station-level assessments, encompassing 11 cropland stations, and prefecture-level evaluations, encompassing 339 prefectures.Notably, our model is adaptable to future scenarios, contingent upon the availability of reliable data sets.Specifically, we extend our proposed machine learning model to project IWU trends up to the 2100s, utilizing data from four Earth system models.The results indicate that, under various emissions scenarios, China's IWU is expected to increase by 8.5%-17.1% (6.8%-34.8%)by the 2050s (2100s) compared to the historical period (1981-2010), with higher emissions resulting in more significant increases.This rise in IWU by the 2050s (2100s) is associated with an estimated additional cost of US $1.65-3.91($2.28-6.5) billion per year.Our findings carry substantial implications for policymakers, providing them with valuable insights to anticipate forthcoming challenges and opportunities in water resources management.This is particularly pertinent to water security and the intricate balance between water supplies and usage across diverse regions.(Hu et al., 2021).The CMIP6 models data (Eyring et al., 2016) is available from the website (https://esgf-node.llnl.gov/search/cmip6/).The crucial sections of the computer code employed for processing the results and creating the figures can be found in the Liu et al., 2024.

Figure 1 .
Figure 1.Study region and conceptual framework.(a) Map of irrigated area fraction across China (red to green shading), with province (solid gray) and prefecture (dashed gray) boundaries delineated.Locations of 11 croplands sites are flagged (in green).(b) Area of irrigation in China displayed over the period of 1981-2013.(c) Flowchart of the methodology.

Figure 2 .
Figure 2. Correlation between irrigation water use (IWU) and its explanatory factors, including P (precipitation), ET (evapotranspiration), SSM (surface soil moisture), RZSM (root zone soil moisture), SRAD (shortwave radiation), T max (maximum air temperature), and Tmin (minimum air temperature).(a) Scatterplots of IWU against the explanatory variables at field scale.(b) Averaged IWU (blue bar) and the associated standard deviation (represented by error bar) corresponding to a range of percentiles for individual explanatory factors.(c) Averaged (blue bar) and standard deviation (error bar) of the permutation-based importance of explanatory factors to prefecture-level IWU.The relative importance of individual explanatory variables at field scale is also displayed (red circle).

Figure 3 .
Figure 3. Magnitude of change for explanatory factors.(a) Trends of individual explanatory factors during 1981-2013, including P (precipitation), ET (evapotranspiration), SSM (surface soil moisture), RZSM (root zone soil moisture), SWE (snow water equivalent), SRAD (shortwave radiation), T max (maximum air temperature), and Tmin (minimum air temperature).(b) Ratio of change between 2031 and 2100 and reference period of 1981-2010 for individual explanatory factors.The dots represent the median values while bars indicate the entire ranges.
Meanwhile, we observe that the model captures intermediate IWU well but underestimates the high values and overestimates the low values of the IWU range.It means that the ML model depicts the dominant factors but potentially neglects some subtle signals that can be essential in capturing extreme IWU (e.g., very high and low values).We further amplified this asymmetric performance in calculating IWU by checking the model residuals and found no clear pattern for the model residual (FigureS11in Supporting Information S1).Such asymmetric performance is common among ML models, partly attributed to the low data availability in extreme regimes (K.

Figure 4 .
Figure 4. Evaluation of ML-based irrigation water use (IWU) estimates.(a) and (b) Spatial patterns of IWU estimates based on satellite observations and reanalysis outputs and their accuracy assessment against reported values.(c) Independent accuracy assessment of IWU estimates against field measurements.(d) Accuracy of IWU estimates visualized by dry and humid regions.(e) and (f) Same with (a) and (b) but for four earth system models based on historical scenarios.

Figure 5 .
Figure 5. Sensitivity analysis of explanatory factors and regression model structure.(a) Relative change of model performance when replacing precipitation with other products as compared to the baseline model.See Figure S13 in Supporting Information S1 for accuracy of the ML-based model forced with ERA5 precipitation, GLEAM ET, and ESA gap-filled SSM.(b)-(d) Same with (a) but for ET, SSM, and regression approach, respectively.

Figure 6 .
Figure 6.Variations and trends of irrigation water use (IWU) during 2031-2100.(a) Time series of China's IWU under low-, medium-, and high-emissions scenarios.(b) and (c) Change ratios of China's IWU at the province scale by middle (2031-2050) and late (2081-2100) century relative to the historic period (1981-2010).The additional cost induced by this change is shown in the bar plot, which indicates the entire range.Dots show the median value across the specified model while bars indicate the entire ranges.

Table 1
Description of the Cropland Field Stations Cui et al. (2014)Note.References are provided to describe the sites.LIU ET AL.
In addition, our work may provide evidence for water resource management, such as China's South-to-North Water Diversion Project (Y.-C.Wang  et al., 2021), which has been transferring water from southern China to northern China since 2013.While water resources scarcity in some of northern China (e.g., Haihe River Basin and the Yellow River Basin) will be exacerbated due to the increased IWU under a warming climate, in other regions (e.g., Huaihe River Basin), it will alleviate slightly (Figures 15 and 16 in Supporting Information S1).Hence, it is necessary to consider the adaptation strategy of the water diversion projects.