A Novel Double Machine Learning Strategy for Producing High‐Precision Multi‐Source Merging Precipitation Estimates Over the Tibetan Plateau

Precipitation estimation over the Tibetan Plateau is a critical but challenging task due to sparse gauges and high altitudes. Traditional statistic methods are often insufficient to characterize the nonlinear relationship between different precipitation information, while machine learning techniques, particularly deep learning algorithms, offer a novel and powerful approach to improve the merging accuracy of multi‐source precipitation data by efficiently capturing their spatiotemporal dynamics features. This study introduced a novel strategy called Double Machine Learning (DML), which integrates meteorological information, satellite retrievals, and reanalysis data to produce a high‐precision multi‐source merging precipitation product at 0.1° × 0.1°, daily resolution for the Tibetan Plateau. The quantitative evaluation of DML was accomplished using both auto‐meteorological gauges and independent observations. Statistical scores indicate that the new DML‐based merging product apparently outperforms three widely‐used precipitation datasets (IMERG‐Final, GSMaP‐Gauge and ERA5) over the Tibetan Plateau. The proposed DML strategy effectively integrates the advantages of traditional machine learning and deep learning, significantly enhancing the algorithmic robustness and merging accuracy, particularly at medium‐high rain rates in summer. Furthermore, the contributions of multi‐source inputs to the final merging effect was systematically analyzed. It is found that meteorological information, as an auxiliary variable in DML, plays a crucial role in identifying rainy events and adjusting the bias of precipitation estimates, especially over those ungauged regions. This study affirms the call for improving the multi‐source precipitation estimates by combining different machine learning approaches. The new merging precipitation product reported here is recommended for hydrometeorological users of the Tibetan Plateau science community.


Introduction
Precipitation is one of the key hydroclimatic factors in water and energy cycles (Allen & Ingram, 2002).Tibetan Plateau (TP), often referred to as the "Asian water tower", is the source region of many major rivers on the Asian continent (Huntington, 2006;Nash & Sutcliffe, 1970).Accurate precipitation estimation over the TP is of great importance in studying the hydrological processes of these river basins (Takido et al., 2016).However, it is difficult to precisely measure precipitation relying on conventional rain gauge networks or weather radars over the TP because their distributions are sparse and data availability in complex terrain and harsh climates is rather limited.In practice, few rain gauges (less than 0.6/10,000 km 2 ) and scarcely any radar are currently distributed over the TP (Kang et al., 2010;Liu & Yin, 2001).The density of ground observations is inadequate to represent the areal precipitation information for high-altitude watersheds, which hinders local hydrologic investigation and water resources management.
Satellite remote sensing can obtain real-time and frequent observing information from space, offering a potential alternative source of precipitation estimates for the TP.The Tropical Rainfall Measuring Mission (TRMM), launched on 27 November 1997, provided rather important and valuable rainfall information over the tropics and subtropics, which effectively improves our understanding of the spatiotemporal distribution and dynamic change of precipitation (Huffman et al., 2010).In order to fill the absence of high-latitude observations (Milewski et al., 2015), the Global Precipitation Measurement (GPM) Core Observatory was launched on 28 February 2014 (https://gpm.nasa.gov/missions/GPM),aiming to improve the potentials in retrieving on light and solid precipitation and meanwhile extend data coverage from original TRMM's 35°S/N to 65°S/N (Huffman, 2016).
Two representative satellite-based precipitation products (SPPs) derived from GPM core observation data, that is, the Integrated Multi-Satellite Retrievals for Global Precipitation Measurement (IMERG; Huffman et al., 2019) and Global Satellite Mapping of Precipitation (GSMaP; Kubota et al., 2007;Kubota et al., 2017), have displayed acceptable data quality (Tang et al., 2016;Zhou et al., 2020) and were widely applied in hydrological process simulation and water resources management at local, regional, and even global scales (Tashima et al., 2020;Wang et al., 2017).Also, some previous studies have attempted to evaluate the effectiveness of SPPs over the TP, most of them focusing on IMERG and GSMaP (Li et al., 2022;Lu & Yong, 2018;Zhu et al., 2018).However, almost all evaluations suggest that both IMERG and GSMaP have relatively worse performance due to interference of complex terrain and lack of ground observations (Gao & Liu, 2013;Wang et al., 2018).Admittedly, it is a challenging task to create a practical solution to effectively improve the data accuracy of precipitation estimates over this specific region.
Compared to SPPs, atmospheric reanalysis products provide long-term and wide-coverage data series of various meteorological variables, which are additional but important information for substantially improving precipitation estimates over the TP.Reanalysis data are derived from the data assimilation system, which can better demonstrate the atmosphere circulation's physical and dynamical processes (Zhang et al., 2013).The application of reanalysis data has been a popular topic for the TP science community (You et al., 2012).For instance, the European Centre for Medium-Range Weather Forecasts (ECMWF) produced the Fifth generation of their atmospheric Re-Analysis of the global climate (ERA5; Hersbach et al., 2020), which is currently one of the widelyused reanalysis datasets over the TP.Due to similar spatiotemporal resolutions with SPPs, ERA5 is often used to compare with IMERG and GSMaP in precipitation assessments (Li et al., 2018).Nevertheless, there still exist some controversies about the evaluation conclusions.Presently, it has been reached an agreement in that both SPPs and reanalysis data have their own strengths and weaknesses (Tong et al., 2014).Normally, SPPs have a relative advantage in data accuracy and spatial resolution to a certain degree, while reanalysis data can better capture the spatial pattern of precipitation distribution (Li et al., 2021;Ma, He, et al., 2019).
To fully exert the individual advantages of multiple datasets, multi-source fusion technology has been employed in precipitation estimates (Wang et al., 2022).Given different weights for each data source, multi-source precipitation datasets (such as gauge-based observations, satellite-based estimates, and reanalysis data) are integrated to produce the optimal merging product.For example, Beck et al. (2017Beck et al. ( , 2019) ) illustrated that the Multi-Source Weighted-Ensemble Precipitation (MSWEP) has the best performance among 22 mainstream precipitation datasets, based on a comparative analysis of 76,086 rain gauges and 9,053 small and medium-sized watershed runoff simulations worldwide.However, the merging effect of those traditional weight-based methods is largely dependent upon the data accuracy of input sources.Consequently, the accuracy of the final merging product is commonly restricted by the worst input source so it might not perform as well as expected (Chen et al., 2021;Duan et al., 2022).Moreover, the data precision of the multi-source merging products strongly relies on the density of the ground observation network, which is particularly challenging for the TP region with sparse gauges.Thus, a new multi-source merging approach, which is dependent on weight allocation and only needs a small number of ground observations, is widely anticipated.
Over the past decades, some studies have introduced machine learning techniques into hydrological applications, including multi-source precipitation fusion and impact factor analysis (Li et al., 2021;Xiao et al., 2022;Zandi et al., 2022).Relative to the traditional statistic methods based on weight allocation, machine learning techniques have preliminarily displayed several evident advantages, mainly including that: (a) It is good at solving those complex problems that are hard to complete using traditional methods such as the simple linear regression model; (b) It can quickly integrate more information at a lower computational cost; (c) It can easily build the special model structure for a specific problem (Zhang et al., 2021).
In the early stages, traditional machine learning (TML) models, such as eXtreme Gradient Boosting (XGBoost), Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN), gained consistent recognition due to their use convenience and computation stability.Recently, some studies compared and analyzed different TML models in precipitation fusion and rainfall forecast, indicating that XGBoost and RF generally perform better than other models (Dong et al., 2023;Senocak et al., 2023).On the other hand, the comparisons between TML and the newly emerging deep learning (DL) approach displayed that the latter looks more advantageous in unrestricted conditions (Chauhan & Singh, 2018;Wang et al., 2021).Looking into the future, DL will play an important and long-term role in dealing with big data and complex problems (Alzubaidi et al., 2021).However, studies have also shown that DL is not a good choice for some special data types and sample series.Under certain conditions, TML (e.g., XGBoost) might actually perform better than DL (Shwartz-Ziv & Armon, 2022).Thus, it is imperative to compare different machine learning models and select the most suitable one or combination to develop an operational strategy for producing high-precision multi-source merging precipitation datasets for the TP.
In the present implementation, some studies have focused on individually applying the TML or DL techniques in precipitation estimation (Pan et al., 2019;Wu et al., 2020).However, few studies can integrate these two types of techniques together to develop better precipitation products by making full use of their respective superiority.Hence, in this study, we proposed a novel strategy called Double Machine Learning (DML) to develop a high-quality precipitation product for the TP region by decomposing the complex problem of multi-source precipitation fusion into two steps, that is, the classification of rainy events and the regression of rainfall amount.Some TML models are compared and employed to address the classification issue in the first step, while a more sophisticated and complex model structure is needed for the second step.Thus, here we incorporated the Long Short-Term Memory (LSTM) to solve the second sub-problem.LSTM is a classic and widely-used DL model that can efficiently capture long-term dependencies of the input sequence by developing gate mechanisms (Shen et al., 2020;Miao et al., 2020;Sahoo et al., 2019;Ma et al., 2022).Some recent studies have reported on the application of LSTM in catching the distinct seasonal variations of precipitation distribution over the TP (Lu & Liu, 2010).
Combining XGBoost-based classification and LSTM-based regression, the DML strategy can produce multi-source precipitation estimates over the TP.The inputs of the DML strategy include two satellite-based precipitation datasets (GSMaP-Gauge, and IMERG-Final), one reanalysis precipitation dataset (ERA5), and available meteorological information (Figure 1).Thus, two key components in the precipitation estimates, that is, rainfall occurrence and rainfall amount, are independently computed by taking advantage of the above multi-source data inputs.Finally, the computing results are integrated into the precipitation background field to produce the DML-based high-precision TP's precipitation dataset (hereafter referred to as DML-TPP).The DML-TPP data has been uploaded to the website of the National Tibetan Plateau Data Center (TPDC, https://data.tpdc.ac.cn/).The TPDC is the largest scientific data center for the Tibetan Plateau in China (Pan et al., 2021).
In this study, our major objectives are threefold: (a) introducing the proposed DML strategy, (b) generating a high-precision multi-source merging precipitation product over the TP, and (c) analyzing data accuracy and merging characteristics of the new precipitation product.In the next section, we describe the study area and the datasets.A presentation of the methodology used follows in Section 3.Then, results and discussions are presented in Section 4 and Section 5, respectively.Summarizing remarks and conclusions finalize the paper in Section 6.

Study Area
The Tibetan Plateau (TP), also known as the world's Third Pole, is a vast region of approximately 2.5 million km 2 , with an average elevation exceeding 4,000 m above sea level.The TP spans between 73°E-104°E and 26°N-40°N , extending up to 2,800 km from east to west and 300∼1,500 km from north to south (Royden et al., 2008).The TP is rich in glaciers, snow, rivers, lakes, and underground aquifers, and the unique interactions among the atmosphere, hydrosphere, cryosphere, and biosphere ensure the permanent flow of Asia's major rivers (Liu & Chen, 2000).Over this region, precipitation has a profound impact on environmental changes and social development (Yang et al., 2011).However, the majority of rain gauges are distributed in the eastern margin of the TP, while few ground observations are situated in the central and western parts of the TP (see Figure 2).

Ground Reference
In this study, we take ground observed records as the true precipitation values for training and validation.The ground observations used were obtained from China Meteorological Administration Multisource Precipitation Analysis (CMPA), which contains more than 30,000-40,000 automatic meteorological gauges across mainland China.The CMPA dataset is provided by the Chinese National Meteorological Information Center (http://data.cma.cn) from 2008 (Shen et al., 2013).In our analysis, there are 455 selected grids containing at least one gauge in each grid on the TP.The CMPA gauges are predominantly distributed in the eastern part of the TP, with relatively fewer gauges in the western region.The original resolution of the CMPA dataset is 0.1°× 0.1°on an hourly basis.To meet the requirements of this study, we aggregated every 24 hourly CMPA records to obtain the daily precipitation values.
The precipitation records from 23 independent rain gauges were also used for our evaluation.These rain gauges are completely independent of those gauges in CMPA (Yang et al., 2023).All these gauges are located above 4,000 m in the center of the TP where the ground gauges are sparsely distributed (31.40°N-32.51°N, 79.69°E-90.86°E).Each observing point comprises a Hobo Onset tipping bucket rain gauge with a precision of 0.2 mm.The sampling interval is 1 hour.According to the Chinese national standard, all gauges were installed with a 70 cm height from the gauge top to the ground surface.During the installation, all the connection cables between the gauge and the logger are specially enclosed in a weatherproof box to avoid direct exposure.Routine maintenance was made in the field during every summer examination so these gauges have steadily been working till now (see Figure 2).Observations from independent gauges are more reliable compared to CMPA.However, their limited number restricts their use to validate precision only.

Precipitation Datasets
Table 1 presents the spatiotemporal coverage and resolution of datasets used in our study.As the final run products, two satellite-based precipitation products (i.e., IMERG-Final and GSMaP-Gauge) have relatively better performance owing to the gauge-based adjustments employed in their retrieval systems.By assimilating large amounts of ground-based observations, atmospheric detections, and remote sensing data, ERA5 provides reasonable spatial and temporal variability of precipitation at a large scale.

Satellite-Based Precipitation Data Sets
IMERG is the level 3 multi-satellite precipitation system of GPM, which combines various sources of precipitation data including constellation microwave precipitation estimates, infrared (IR) satellite estimates, and monthly rain gauge data.As one of IMERG's suites, IMERG-Final is adjusted by monthly gauge data of the Global Precipitation Climatology Centre (GPCC).Both Dual-frequency Precipitation Radar (DPR) and GPM Microwave Imager (GMI) data are used to calibrate the retrievals from multiple passive microwaves to produce the IMERG products, which can significantly improve the accuracy of final precipitation estimates (Chen et al., 2020;Huffman et al., 2015).The IMERG data series have reached an unprecedented spatiotemporal resolution of 0.1°× 0.1°, 30 min (https://gpm.nasa.gov/data/imerg).However, here we only selected the IMERG-Final product at 0.1°× 0.1°, daily resolution, which is convenient for further processing.
GSMaP is a global satellite precipitation mapping system, which combines data from various available passive microwave (PMW) sensors [i.e., TMI, AMSR-E, SSM/I (F13, F14, and F15) and AMSU-B (N15, N16, N17, and N18)] and IR sensors from all available GEO satellites provided by National Centers for Environmental Prediction (NCEP)/Climate Prediction Center (CPC).Same as IMERG-Final, GSMaP-Gauge is also a gaugeadjusted research-grade product, which is derived from the real-time GSMaP-MVK estimates.The CPC at the National Oceanic and Atmospheric Administration (NOAA) of the United States provides all the gauge observations for its adjustment.It is worth noting that GSMaP-Gauge only used the DPR data (without GMI) as a calibration standard for PMW retrievals (Lu & Yong, 2020).In this study, we maintained similar principles to IMERG and selected a resolution of 0.1°× 0.1°/daily for GSMaP-Gauge (http://sharaku.eorc.jaxa.jp/GSMaP/index.htm).

Reanalysis Datasets
ERA5 produced by ECMWF is a new-generation global reanalysis dataset, which was developed from the fourdimensional variational data assimilation (4D-Var) technique.In ERA5, large-scale precipitation is estimated by the cloud scheme, while convective precipitation is retrieved by the convection scheme (Jiang et al., 2021).The ERA5 precipitation estimates are provided at a fine resolution of 0.25°× 0.25°/hourly (https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5).It can be seen that the spatial and temporal resolutions of ERA5 are inconsistent with the above two satellite-based precipitation products.To address this inconsistency, we adopted the approaches of bilinear interpolation and 24-hr accumulation to preprocess the ERA5 data.

Digital Elevation Model (DEM)
This study employed high-resolution Digital Elevation Model (DEM) datasets from the Shuttle Radar Topography Mission (SRTM) provided by the National Aeronautics and Space Administration (NASA; http://www2.jpl.nasa.gov/srtm/dataprod.htm).The SRTM DEM data has a high spatial resolution and covers more than 80% of the Earth's land surface.The topographic variation plays a critical role in the local orographic precipitation over the TP (Johansson & Chen, 2003;Ma, Zhao, et al., 2019).To keep consistent with precipitation datasets, the original SRTM-DEM dataset with a resolution of 90 m × 90 m was resampled to 0.1°× 0.1°(approximately 10 km × 10 km) using the cubic convolution method.Notably, DEM assumes significance under the condition of complex terrains over the TP.

Meteorological Data
Gauge observation is generally regarded as the most reliable source of meteorological data.As for meteorological gauges, however, it is sparse to record and difficult to maintain on the TP as previously noted.Fortunately, ERA5 provides meteorological data that are closely related to precipitation in physical property, which can help fill the data gap.Referring to previous studies, we preliminarily screened out several meteorological variables as model inputs including dew point temperature, temperature, surface pressure, and wind speed (Ali et al., 2018;Johnson & Hamilton, 1988;Trenberth & Shea, 2005).Although the meteorological information provided by reanalysis data may not be as accurate as expected, its overall performance has been recognized in many studies (Hersbach et al., 2020;Olauson, 2018;Yao et al., 2021).Therefore, meteorological information from ERA5 was chosen for our study because the assimilation process considers the physical relationship between various meteorological variables.The ERA5 variables used in this study are on the single level, for example, temperature and dew point temperature are both at about 2 m.The meteorological data were preprocessed by the same approaches as those of ERA5 precipitation data.

Evaluation Metrics
Quantitative validation of precipitation product performance often hinges on the careful selection of metrics.The meteorological domain typically places emphasis on assessing the probability of precipitation events, whereas the hydrological field extends its focus to the intensity of precipitation.Consequently, our validation approach will comprehensively assess precipitation products from both meteorological and hydrological perspectives.Several statistical metrics are used to evaluate the performance of generated precipitation products based on rain gauges.These metrics can be divided into two categories based on probability and statistics.On the one hand, the Probability of Detection (POD) and False Alarm Ratio (FAR) reflect the ability to detect precipitation events.Specifically, POD shows the frequency of the product correctly detecting precipitation, while FAR reveals the opposite condition.On the other hand, Correlation Coefficient (CC), Root Mean Square Coefficient (RMSE), and Relative Bias (RB, i.e., Mean Relative Error) can quantify the correlation relationship, error, and deviation between the precipitation data product and the actual amount value on statistics.CC can describe the correction between the precipitation products and gauge observations, while RMSE is used to measure the average error magnitude.In addition, the Critical Success Index (CSI) and Kling-Gupta Efficiency (KGE) show the comprehensive capability of the dataset on probability and statistics, respectively.Table 2 summarizes the calculation formulas and descriptions of the aforementioned statistical indices metrics used in this study.More detailed explanations for these metrics can be found in Yong et al. (2013) and Tang et al. (2020).

Principal Component Analysis (PCA)
Numerous studies in the field of machine learning have demonstrated that the dimensionality of the input data has a significant impact on training results (Araki et al., 2016;Mohammed et al., 2011).Insufficient data dimensions can result in an undertrained model, whereas excessive data dimensions can lead to overfitting (George & Vidyapeetham, 2012).Therefore, we utilized Principal Component Analysis (PCA) to find the optimal data dimension.PCA is a statistical method for identifying correlations among multiple variables.It seeks to identify the underlying structure of the data using a small number of principal components, obtained by converting the original random vector into a new one with uncorrelated components through orthogonal transformation.

TML Model
Among existed TML models, XGBoost and RF standing out align with two distinct paradigms of ensemble machine learning (i.e., boosting and bagging).XGBoost is designed to improve upon Gradient Boosting Decision Tree (GBDT, Friedman, 2001).Compared to GBDT, XGBoost is based on the second-order Taylor formula and introduces regularization methods.The XGBoost incorporates several steps to build the model.First, a tree with zero depth is created, and all available features are enumerated for each leaf node.Second, the training samples belonging to the node are sorted in ascending order based on the feature value.The optimal splitting point of the feature is determined by linear scanning, and the maximum benefit of the feature is recorded.Third, the feature with the greatest benefit is selected for partitioning, and the corresponding sample set is associated with each new node.The above process is repeated recursively until certain conditions are satisfied (Chen & Guestrin, 2016).On the other hand, RF is an ensemble learning algorithm consisting of multiple decision trees that typically outperforms a single tree (Breiman, 2001).The general procedure for running RF is as follows: First, samples are randomly drawn from the training dataset.Next, multiple iterations of sampling are performed to create a new training subset, denoted as D, and a random selection of m features.Third, using D and m, a complete decision tree is trained, which is repeated several times.Lastly, the results obtained from all decision trees are integrated to generate the final output.
Note.G i refers to the baseline (CMPA or independent observations) and G is the average of the baseline, S i and S mean the precipitation estimates and their average, n refers to the number of samples.H represents the number of precipitation events hit by the precipitation products; M represents the number of precipitation events that were missed by the precipitation products; F represents the number of precipitation events that were misreported by the precipitation products; r is the CC between the reference (gauge measurements) and target (precipitation estimates) datasets, β is the bias ratio, γ is the variability ratio, μ is the mean precipitation, and σ is the standard deviation.

Water Resources Research
10.1029/2023WR035643 LYU AND YONG Furthermore, classical SVM and KNN are included as the reference models.SVM is a model that employs a hypothesis space of linear functions in a high-dimensional feature space and is trained using a learning algorithm from optimization theory.The general framework for running SVM can be classified as follows: (a) In the linear case, the problem is transformed into a convex optimization problem, which can be simplified by the Lagrange multiplier method; and (b) In the nonlinear case, the kernel function is used to map the sample into a highdimensional space, transforming it into a linear case (Chen et al., 2005).KNN is an algorithm that determines the category of a sample by identifying the nearest one or several samples with known categories.The general running steps for KNN can be summarized as follows: First, the algorithm calculates the distance, typically using the Euclidean distance, between the test sample and every other sample in the dataset.Second, the K points with the shortest distances are selected and sorted.Lastly, the category of the test sample is determined based on the category of the majority of the K nearest neighbors, using the principle of majority voting (Kramer, 2013).

DL Model
LSTM is a DL algorithm based on Recurrent Neural Network (RNN; Sherstinsky, 2020), which is suitable for dealing with problems with long time series dependencies.LSTM solves the problem of gradient explosion and gradient disappearance of RNN by the addition of "Forget Gate", "Input Gate" and "Output Gate".
c t = f t c (t 1) + i t g t (5) where δ is the activation function; i t is the input gate; f t is the forget gate; g t is the cell state; o t is the output gate; c t is the long term memory; h t is the short term memory.
The operation process of LSTM can be summarized in the following steps: First, the "Forget gate" decides which information should be removed from the cell state.Second, the "Input gate" and a tanh layer decide which new information should be stored in the cell state.Next, the old cell state is updated.Finally, based on the filtered cell state, a decision is made regarding the output, which is similar to a conveyor belt that constantly transfers the cell's state (Hochreiter & Schmidhuber, 1997).

DML Strategy
Building upon the works of Zhang et al. (2021) and Lei et al. (2022), we introduce a novel strategy called Double Machine Learning (DML) that merges TML and DL techniques in a two-step process to produce highly accurate precipitation estimates.The proposed strategy aims to simplify the complex task of precipitation estimates by dividing it into two sub-problems: (a) identifying the occurrence of precipitation at specific grid points, and (b) estimating the amount of precipitation if it occurs.The flowchart of the DML strategy is displayed in Figure 3, and the productive processes of the DML strategy mainly include four steps as follows.

Data Preprocessing
First, all datasets have been unified to the spatial resolution of 0.1°using the bilinear interpolation method.In bilinear interpolation, the value of the new grid is calculated by those of the four pixels closest to it.This interpolation method has been commonly applied to resample the grid datasets (Beck et al., 2019;Satgé et al., 2020).Then, all the data are normalized and standardized to improve training efficiency.
where f(x) is the result after normalization; xi = 1,2… is the origin value.
where f(x) is the result after standardization; xi = 1,2… is the original value; μ and σ is the expectation and variance.
Second, PCA was employed to ascertain the optimal number of input features.
Initially, nine key features that could be merged into the final precipitation estimates were collected, informed by existing literature and empirical knowledge.These features are mainly from the datasets introduced in Section 2, including IMERG-Final, GSMaP-Gauge, ERA5, 2m Dew point temperature, 2m Temperature, Surface pressure, U-component of wind 10 m, Vcomponent of wind 10 m, and DEM.Subsequently, PCA was utilized to determine the number of features that strike a balance between the advantages and costs associated with dimensionality expansion.
As illustrated in Figure 4a, when utilizing more than six input features, the selected features collectively explained 98.4% of the original nine.This observation suggests the presence of linearly related information among the full nine features.However, reducing the number of features to less than six resulted in a significant reduction in the explanatory efficiency relative to the original condition.Consequently, it was deemed most appropriate to retain six features, as indicated by the elbow point in Figure 4a.Further insights into this process are offered in Figures 4b and 4c. Figure 4b reveals that the explained variance ratio if using one feature accounted for only 48.3% of that employing all nine features.With the increase in feature numbers, the explanatory efficiency did not exhibit a corresponding increase as expected due to the linear relationship characteristics that existed in these features.This further verifies the necessity of controlling feature numbers.Figure 4c illustrates the motivation behind adopting the PCA algorithm to search for the optimal number of features.In our experiments, the employment of all nine features will bring the risk of information redundancy, suggesting higher interpretation complexity and larger computation costs.Conversely, approximately 52% of total input information was lost when using only one feature.The transition between these two extreme cases is monotonic but nonlinear.Here, the effect of PCA used is to determine the optimal feature number, which can keep a good balance between information redundancy and information loss.Next, we also applied the RF algorithm to repeat the same processing procedure as PCA. Figure 4d shows the percentage of feature importance derived from RF.As can be seen from Figure 4d, approximately 95% of total input information can be captured by employing six features in the RF-based optimization.The corresponding value is 98.6% for PCA (see Figure 4a).The results from both two algorithms confirm that the choice of six features is most appropriate for our experimental study.
As documented, the PCA algorithm is capable of compressing all data into a new reference coordinate system, but it normally works under the assumption that each set of input data contributes equally owing to the unsupervised mode of PCA itself.In our study, the DEM dataset lacks physical variability in time series.In addition, it is difficult to reflect the entire impact of wind on precipitation in the current merging model because the wind speed needs to be simultaneously considered in two directions of both U and V.Moreover, their contributions to the training process are not as significant as those of other features.As a result, we made a choice to manually exclude these two types of data to realize efficient control of input feature numbers.In conclusion, the final inputs of our merging model primarily include IMERG-Final, GSMaP-Gauge, ERA5, Surface pressure, 2m Dew point temperature, and 2m Temperature.Third, the CMPA gauge data were processed to generate labels for both classification and regression tasks.The classification labels indicate the occurrence of precipitation events, denoted by 1 and 0. The regression labels contain the values of precipitation amount.In practice, four complete years of CMPA data (from 2016 to 2019) were used for training, while one year data of from 2015 was set aside for validation.

Classifying Precipitation Events by TML
The preprocessed datasets were used for pre-training in four types of TML models (i.e., RF, XGBoost, SVM, KNN), employing the five-fold cross-validation method.During the process of five-fold cross-validation, the dataset is first partitioned into five subsets.In each iteration, four of these subsets serve as the training data, while the remaining one is used as the validation data.This process is repeated five times, and the results of these five experiments are collectively considered.Default model settings and parameters were utilized for all models during this process.The accuracy scores of these four models are presented in Figure 5a.The results showed that XGBoost generally outperforms other three models in our experiments.Therefore, XGBoost was selected as the classification model.To further improve the accuracy of XGBoost, we optimized the model's five hyperparameters using the grid search method, resulting in an increase in the accuracy score from 84.5% to 88.4% (see Figure 5b).Using the XGBoost model after training, the occurrence of precipitation events can be efficiently estimated.

Estimating Precipitation Amount by DL
After the XGBoost-based model provides the classification results, only the relevant information pertaining to the precipitation events is retained, including the six features and regression labels from CMPA.For the precipitation amount estimates, the training dataset consists of 80% of the whole dataset (i.e., 2016-2019), while the remaining 20% (i.e., 2015) is reserved for validation purposes.
The LSTM model was configured with two layers and a hidden layer size of 64.It was then trained on the training dataset using the following hyperparameters: batch size = 31, learning rate = 0.001, and time step = 1.This  (Wu et al., 2020), were employed during the process.The MSE loss was computed, and the model parameters were optimized through backpropagation.Figure 5c illustrates the gradual decrease and eventual stabilization of the loss for both the training and test datasets.Typically, the inflection point represents the optimal number of training epochs, as improper training can lead to either overfitting or underfitting.In this case, the loss stabilized after 20 epochs and maintained steady after 40 epochs.Therefore, we set the training epochs as 30.Once trained, the LSTM regression model will be capable of generating precipitation amounts for subsequent steps.

Choosing Precipitation Background Field by Validation
The results derived from both the classification model and regression model are subsequently integrated into the following stage to produce the final DML-TPP.Specifically, precipitation event information and precipitation amounts are incorporated into the existing precipitation background field, replacing the original precipitation estimates.The resulting products are named DML-IMERG, DML-GSMaP, and DML-ERA5, based on different precipitation background fields (IMERG-Final, GSMaP-Gauge, and ERA5, respectively).Different precipitation background fields are compared to select the best while keeping other input conditions consistent.Thus, the output results from Steps A to C are utilized in all of the aforementioned six input features, that is, IMERG-Final, GSMaP-Gauge, and ERA5, along with meteorological information.Furthermore, we compared the resulting precipitation products based on different precipitation backgrounds using six validation metrics (i.e., CC, RMSE, RB, POD, FAR, CSI, see Figure 6).This step aims to choose the best background according to the validation results.
As depicted in Figure 6, the original IMERG-Final, GSMaP-Gauge, and ERA5 are regarded as comparison references.The initial evaluation results already demonstrate the effectiveness of the DML strategy, although mainly for the purpose of selecting the appropriate precipitation background field in this step.Based on the comparison results, the DML-IMERG with the best statistical metrics was selected as the output product of the DML strategy, which is referred to as DML-TPP hereafter.A more detailed analysis of the results is discussed in the following two sections.

Overall Performance
The scatter density plots comparing precipitation products and CMPA gauges are presented in Figure 7. Overall, it can be observed that DML-TPP exhibits the best performance, particularly in identifying precipitation events.Specifically, DML-TPP outperforms the other three precipitation products with respect to the POD (0.63), FAR (0.07), and CSI (0.59).This is mainly attributed to the ability of the DML strategy to identify precipitation events previously and separately.Additionally, it can be seen that DML-TPP has the strongest correlation with CMPA observations, that is, the highest CC values of 0.76.Since the DML strategy does not use CMPA gauge information after training is completed, the good performance can be attributed to the strategy itself, particularly the DL algorithm.In fact, previous studies have reported that the DL algorithm can visibly enhance CC values (Wang et al., 2020;Wu et al., 2020).
In terms of RMSE and RB, DML-TPP still performs well.Specifically, the RMSE value of DML-TPP (3.73) is higher than that of GSMaP-Gauge (3.56) but lower than that of .In general, a lower RMSE value indicates better accuracy.However, a higher RMSE value can also indicate increased sensitivity to extreme precipitation, which will be discussed in further detail in Section 4.1.3.Regarding the RB values, all four products overestimate the total amount of precipitation.DML-TPP performs better than GSMaP-Gauge and ERA5, but slightly worse than IMERG-Final (i.e., only by 0.02%).
Furthermore, according to the scatter distribution of the four products, we can summarize the obvious adjustment trace made by the DML-TPP estimates: (a) Notably reduces the distribution of scattered points below the 45°line, particularly during light precipitation events, leading to a reduction in the FAR; (b) Generally aligns the scattered points that represent high-intensity precipitation closer to gauge-based precipitation estimates, albeit with slight overestimates; (c) Combines the superiority of GPM (i.e., IMERG-Final and GSMaP-Gauge) in retrieving the rainfall amounts at lower rain srates and the advantage of ERA5 in detecting precipitation anomalies at higher ones.

Seasonal Evaluation
The Taylor diagrams presented in Figure 8 display the performance of four precipitation products across four seasons.In the Taylor diagram, points closer to the "Observation" point on the X-axis show better accuracy.Overall, DML-TPP exhibits significantly better performance in summer and a slight advantage in spring and autumn.As the summer period experiences the highest frequency and amount of precipitation, it is a crucial time for the data users.DML-TPP is closer to the observation with better performance indicated by CC and RMSE values (i.e., RMSD).As detailed in Section 4.1.1,it can be inferred that higher RMSE values (i.e., poor performance) primarily correspond to spring and winter, rather than summer and autumn.
Inconsistencies in the quality of gauge labels collected during different seasons, as one of the important causes, play a significant role in influencing the reliability and accuracy of the estimates.In winter, especially in high-cold regions, in-situ precipitation observations often display notable variability.This variability poses additional challenges to the analysis of DML-based models, leading to the instability of calibration values.Additionally, the presence of diverse precipitation types during winter months presents more challenges in precipitation estimation.To tackle these difficulties in winter precipitation estimation, DML-TPP may need to explore modifications to the existing strategy (e.g., the exclusion of solid precipitation samples from LSTM regression training) or incorporate additional information.

Intensity Evaluation
The intensity and frequency of precipitation vary across the TP, where heavy rainfall contributes most to the total precipitation amount but occurs infrequently.However, existing evaluation metrics do not consider this variability.Thus, it is necessary to conduct a detailed assessment of different precipitation intensities.To verify the performance of each precipitation product under different precipitation intensities, precipitation events are divided into three categories based on CMA standards: light rain (0.1-5 mm/day), moderate rain (5-50 mm/day), and heavy rain (>50 mm/day).
Figure 9 shows the histograms of three evaluation metrics (i.e., CC, RMSE, and RB) of four precipitation products.It is evident from Figure 9a that DML-TPP exhibits the highest CC value under all three types of precipitation intensities.Figure 9a provides a further explanation of the content mentioned in Section 4.1.1.We conclude that the RMSE value of DML-TPP is higher under moderate and light precipitation intensities, while the opposite is true under high precipitation intensity.This ultimately results in an overall higher RMSE for DML-TPP.The higher RMSE value for moderate and light precipitation intensities may indicate that DML-TPP better reflects the differences between light precipitation events.At the same time, the lower RMSE value at high rain rates showed more precise estimates of extreme precipitation.
However, as shown in Figure 9c, we must acknowledge that DML-TPP overestimates light precipitation, which is also one of the reasons for the high RMSE value.Therefore, the advantage of DML-TPP in RB mainly focuses on moderate and high precipitation intensities, rather than light rain rates.Interestingly, only DML-TPP overestimates precipitation intensity at moderate precipitation rates insignificantly, while the other products significantly underestimate it.Additionally, the RB value of DML-TPP is biased less compared to other products under LYU AND YONG high precipitation intensity.Such an improvement is meaningful, as moderate and high precipitation intensities are not only essential for hydro-meteorological applications but also represent one of the weaknesses of GPM observations.

Comparison and Validation Based on Independent Observations
There may be doubts about the reuse of CMPA data in the training and production stages.Although the DML strategy completely avoids this possibility by isolating the training set and the validation set, we want to illustrate and prove it in this section.Independent observations were deliberately included as the additional validation data.Furthermore, the independent gauges can compensate to some extent for the sparse gauges of CMPA in the central and western TP.The evaluation results of four products based on 23 independent observation rain gauges are shown in Figure 10.The results show that DML-TPP performs significantly better than IMERG-Final and GSMaP-Gauge in all evaluation metrics except FAR.However, DML-TPP, along with IMERG-Final and GSMaP-Gauge, performs worse than ERA5 in statistical metrics such as CC, RMSE, and RB.For the probability metrics (POD, FAR, and CSI), DML-TPP is slightly better than ERA5 (as seen in Figure 10d).The contrasting results can primarily be attributed to the selection of a satellite-based product (i.e., IMERG-Final) as the precipitation background field.However, ERA5 exhibited significantly higher quality than the satellitebased products in this region.In addition, it should be noted that both CMPA and independent observations fall far short of the basic requirements for precipitation measurement.The rain gauges are often situated below mountains, making it challenging for a single observation point to capture the overall precipitation within a 0.1°g rid (approximately 10 km).These limitations can lead to some inconsistencies between CMPA and independent observations.Therefore, further investigation is necessary to gain a deeper understanding of this issue.To investigate this problem, we employ KGE, a combination of three dimensions of statistical evaluation indicators.Figures 11a-11d display the KGE distribution of both products and their respective background precipitation based on independent observations, ranked from good to bad.Similarly, Figures 11e-11h display the same but based on CMPA gauges.
We recognized that the observed bias in the results might arise from differences between IMERG-Final and ERA5 as precipitation background fields.Consequently, to investigate this conjecture, we produced DML-IMERG and DML-ERA5 using the same results provided by classification and regression models but only based on different precipitation backgrounds.DML-IMERG here is equal to DML-TPP in Section 3.3.It can be clearly seen that the evaluation conclusions are not the same based on different ground references.The performance of DML products is related to their own precipitation background.More specifically, although DML-ERA5 can achieve similar effects to ERA5, it also leads to a decrease in quality under CMPA gauge-based evaluation under the condition of using the same outputs of DML strategy for DML-IMERG and DML-ERA5.This indicates that the changes in the evaluation results are mainly due to the change in the precipitation background rather than the DML strategy itself.
Another contributing factor is the uneven distribution of CMPA rain gauges.An analysis of the gauge distribution reveals a notable absence of gauges in the central mountainous region of the TP.The fundamental principle of DML is to acknowledge the existence of a complex yet approximate nonlinear relationship between multi-source data and actual precipitation at each grid.Therefore, DML models trained extensively using rain gauge data from the southeastern region may not perform optimally in the central TP.

Effects of Multi-Source Inputs on Precipitation Products
Our validation results show that DML-TPP performs well on the TP because the DML strategy combines the advantages of TML and DL.However, whether and how the multi-source input we have chosen (which consists of Both Figures 12e and 12f indicate that the integration of meteorological information and satellite retrievals has led to a decrease in the total precipitation estimate for DML-TPP, while Figure 12g demonstrates that ERA5 has increased the amount of precipitation in estimates.Through the use of multi-source data, the robustness of the final DML-TPP can be controlled.The respective role of each source can be inferred by analyzing the results.Specifically, Figure 12e suggests that meteorological information has little effect on precipitation estimates in the eastern TP and Qaidam Basin, but significantly affects the central and western parts of the TP, where rain gauges are scarce.Figure 12f shows that GSMaP-Gauge plays a role in adjusting the total precipitation estimates across the entire region, benefiting from large-scale satellite observations.Finally, Figure 12g demonstrates that ERA5 The results of quantitative evaluation can further support the aforementioned findings, as presented in Table 3. DML-TPP with multi-source input demonstrates superior performance in detecting precipitation events, exhibiting the highest values for POD, FAR, and CSI.Thus, the effective capacity of estimating precipitation events is improved by taking into account multiple source information.DML-NoMe has the best CC and relatively good RMSE and RB values in gauged regions.As our analysis indicates, meteorological information plays a crucial role in supporting estimates for regions without precipitation measurement.Without the use of meteorological information, although there may be a slight improvement in the estimation results for areas with available data, there could be significant uncertainty in areas lacking data.Furthermore, we observed that reanalysis data and satellite-based estimates have contrasting effects on the estimated precipitation amounts (as shown by RB, DML-NoGs: 15.56%; DML-NoEr: 5.43%).Therefore, we recommend combining these data sources as input (DML-TPP: 10.68%).In conclusion, the quantitative evaluation demonstrates that incorporating multi-source information improves the final product's overall performance.

Model Selection Based on Theoretical Analysis
The DML strategy exhibits substantial flexibility, allowing for comparisons with various TML and DL models to identify the optimal algorithm combination under specific conditions.Thus, it is crucial to elucidate the basic principles and theoretical foundations guiding the selection of TML and DL models within the DML strategy.
The choice between TML and DL depends on the inherent complexity of the problem.The feature of TML is simplicity and efficiency meaning that it does not need to consider the model constriction in extra.Instead, it only needs to use preprocessed data to fit the model structure.Consequently, the TML models can operate and yield relatively high efficiency in simple classification tasks.Compared to TML, the DL models can offer a stronger upper limit in model capability, characterized by building sizable models and numerous parameters.It is noteworthy that some studies have confirmed that DL apparently outperforms TML under the condition of sufficient data and adequate training (Hamori et al., 2018).However, employing DL methods for classification tasks in practical applications, such as the dry-wet classification in this study, can also yield good results but at a higher computational cost and overfitting risk.Further numerical evaluations can be found in Table S2 in Supporting Information S1.Therefore, we made a choice of TML models to address classification problems and DL models to overcome regression challenges.
In the realm of TML, ensemble machine learning models (e.g., RF and XGBoost) have developed significantly over the past decades, surpassing most other traditional algorithms by a considerable margin.The new ensemble machine learning has given rise to two distinct categories: the bagging paradigm, exemplified by the RF, and the boosting paradigm, represented by XGBoost.For example, both RF and XGBoost are significantly better than the classical SVM and KNN in this study.However, the choice in the two models remains for further investigation.Hence, we recommend the utilization of XGBoost/RF and suggest selecting the most suitable algorithm for a given problem and data.
DL is currently experiencing a period of rapid expansion, characterized by the continuous introduction of new algorithms showcasing exceptional performance.However, it is important to acknowledge that these achievements are often limited and reliant on particular datasets or problem domains.For practical applications, adopting the latest DL models can entail a steep learning curve and substantial computational costs.Additionally, the DL models heavily depend on hyper-parameter choices, often necessitating manual fine-tuning.Hence, for practical applications, we favor the utilization of established solutions with proven stability, such as LSTM.

Limitation and Outlook
In spite of the algorithm efficiency and robustness proven in statistical validation, there still exist inadequacies in the current DML strategy, such as the use of precipitation background.Initially, we did not intend to set the precipitation background, while directly using the outputs of the DML strategy to produce precipitation products.However, such an approach lacking of precipitation background field often leads to the appearance of discrete samples, which do not match with the spatial pattern of actual rainfall, although its evaluation result looks satisfactory.This is mainly because machine learning models lack strict physical mechanisms.Although the use of precipitation backgrounds can address this issue, it also leads to a reduction in certain evaluation results.In the future, we aim to resolve this problem by adding modules to the model that incorporate physical mechanisms and constraints (Li et al., 2023).
Although existing analysis demonstrates the reasonableness and efficiency of combining XGBoost and LSTM, the rapid development of DL models introduces new possibilities.For example, there is a growing emphasis on extracting information from a spatial perspective, exemplified by models like ConvLSTM and trajGRU.Recent advancements, such as attention systems combined with LSTM, have proven effective in flood prediction and precipitation forecasting (Ding et al., 2020;Tao et al., 2021).The incorporation of an attention system helps the DL model focus on dominant precipitation patterns.In the realm of TML, while model performance has approached a bottleneck, emerging models like LightGBM are designed to enhance computational efficiency.Last but not least, additional data has the potential to further optimize the product.Machine learning methods are data-driven, meaning that incorporating more and better data would improve results.However, the current information cannot satisfy the needs of machine learning models, especially for DL methods.Leveraging machine learning models to extract meaningful insights from a broader application of available information is essential for enhancing the data accuracy of DML-TPP.Expanding the training capacity, particularly in addressing the scarcity of low rainfall intensity samples during winter, is a viable solution to overcome the current limitations of DML-TPP.

Conclusions
In this study, we proposed a novel Double Machine Learning (DML) strategy, which combines the advantages of both traditional machine learning and deep learning to produce a high-precision precipitation dataset for the Tibetan Plateau (DML-TPP, 0.1°× 0.1°/daily, 2015-present).In the DML strategy, we used the sparse gauge observations from CMPA and the meteorological information from ERA5 to effectively improve the merging accuracy of satellite precipitation estimates (IMERG-Final and GSMaP-Gauge) and reanalysis data (ERA5).The main findings of our study are as follows: Evaluation results show that DML-TPP apparently outperforms three widely used mainstream precipitation products (IMERG-Final, GSMaP-Gauge, and ERA5) concerning most statistical indices, especially for identifying rainy events and computing rainfall amounts at medium-high rain rates.Relative to IMERG-Final, GSMaP-Gauge, and ERA5, DML-TPP exhibits the highest CC values, lower RMSE values, and better detection for rainy events (including POD, FAR, and CSI).Assisted by traditional machine learning, the DML strategy efficiently identifies rainy events (e.g., the FAR value is only 0.07), while the incorporation of deep learning notably enhances the data accuracy of precipitation estimates (e.g., the CC value reaches 0.73).Moreover, precipitation estimates at medium-high rainfall intensities, particularly in summer, show a significant enhancement, which may contribute to a better understanding of water resource management and extreme hydrologic hazards for potential data users over the TP.
In addition, we further analyzed the contributions of different data inputs to the merging effect.It is found that the meteorological information from ERA5 played a crucial role in identifying rainy events and adjusting the bias of precipitation estimates over those regions with sparse gauges due to additional precipitation-related information provided.The inputs of satellite retrievals mainly contribute to controlling the total rainfall amounts and keeping the spatial consistency in precipitation patterns.The function of reanalysis data is to capture the heterogeneous pattern of precipitation change at the regional scale because the simulation of atmospheric models can reflect complex physical movements.The contribution decomposition of different data sources offers valuable insights and can serve as an important reference for improving the data accuracy and algorithm stability of multi-source precipitation merging.In future work, more efforts are needed to quantify the contributing degree of each input to further address this interesting topic.
Finally, we systematically investigated the model selection and merging mechanism of the proposed DML strategy.Both theoretical analysis and quantitative assessment confirm that the choice of XGBoost for identifying rainy events and LSTM for estimating precipitation amounts is quite suitable.In future works, we will dedicate to substantially improving the merging effect of DML by integrating additional data sources, such as multilayer atmosphere data and underlying surface information.This will be beneficial to satisfy the hydrometeorological users' demands for higher spatiotemporal resolutions, including sub-diurnal step size and kilometer-scale grid size.Through unremitting efforts, we aim to further enhance the precision and applicability of DML so as to address the challenge of hydrological process simulation and water resource management over those ungauged basins.

Figure 1 .
Figure 1.Concept diagram of the Double Machine Learning strategy.

Figure 2 .
Figure 2. (a) Distribution map of the rain gauges used in this study; (b) Topographic map of study area.

Figure 3 .
Figure 3. Flowchart of Double Machine Learning strategy.

Figure 4 .
Figure 4. (a) Process of searching the optimal feature numbers using the Principal Component Analysis (PCA) algorithm; (b) Interpretation ability of each feature numerically; (c) Selection of six features to balance information loss and information redundancy in final; (d) The results of numerical analysis for the same purpose with PCA, but based on the Random Forest algorithm.The Random Forest classification with default hyper parameters is applied for modeling and the "feature importance" in scikit-learn is investigated and shown in the pie chart.Note: The numerical results are based on the following sequence: IMERG-Final, GSMaP-Gauge, ERA5, 2m Dew point temperature, 2m Temperature, Surface pressure, U-component of wind 10 m, V-component of wind 10 m, and DEM.

Figure 5 .
Figure 5. (a) Comparison of training results of four traditional machine learning models; (b) Tuning procedure for the best model XGBoost; (c) Model training process of LSTM.

Figure 7 .
Figure 7. Scatterplots of daily precipitation comparison at the 455 selected grids between four precipitation datasets against the gauge-based CMPA over the Tibetan Plateau in 2015, including (a) DML-TPP, (b) IMERG-Final, (c) GSMaP-Gauge, and (d) ERA5.Six statistical metrics, CC, RMSE, RB, POD, FAR, and CSI, are marked on each plot.

Figure 8 .
Figure 8.Taylor diagrams for four precipitation datasets at the daily scale on the Tibetan Plateau during the study period, including spring [March-May (MAM)], summer [June-August (JJA)], autumn [September-November (SON)], and winter [December-February (DJF)].The red cross on the X-axis represents gauge observation.

Figure 11 .
Figure 11.(a)-(d) Spatial distributions of KGE at the daily scale computed by (a) IMERG-Final, (b) DML-IMERG (using IMERG-Final as precipitation background field), (c) DML-ERA5 (same as DML-IMERG but using ERA5 as precipitation background field), (d) ERA5 against independent observations in the inner part of the Tibetan Plateau.(e)-(h) Spatial distributions of KGE at the daily scale computed by (e) DML-IMERG, (f) IMERG-Final, (g) DML-ERA5, (h) ERA5 against CMPA gauges on the entire Tibetan Plateau.The average values of KGE in each subplot are highlighted in red.

Figure 12 .
Figure 12.(a) Spatial distributions of DML-TPP at the daily scale on the Tibetan Plateau.(b)-(d) Spatial distributions of precipitation products from which partial input is removed, including (b) DML-NoMe (without using meteorological data), (c) DML-NoGs (without using GSMaP-Gauge), (b) DML-NoEr (without using ERA5).(e)-(g) Spatial distributions of the corresponding change in precipitation amount.

Table 1
Summary of Datasets Used in This Study LYU AND YONG

Table 2
The Explanation of Statistical Metrics Used to Evaluate Precipitation Datasets in This Study

Table 3
Statistical Summary ofDML-TPP and Three DML-Based Products  Developed by Ablation Study (i.e., DML-NoMe, DML-NoGs, DML-NoEr,  Without Using Meteorological Information, GSMaP-Gauge, and Reanalysis  Data, Respectively)Against Independent Rain Gauges Over the Tibetan Plateau Note.The best performance among the four datasets for each metric is shaded in orange., we encourage and anticipate the exploration of the latest and most model combinations to unlock the full potential of the DML strategy. Therefore