New Insights Into Error Decomposition for Precipitation Products

It is very important to quantify errors of precipitation estimation products. However, the existing methods do not describe all error components and are therefore not comprehensive enough. In this study, we propose a four‐component error decomposition method (4CED) that decomposes the total errors of precipitation products into four independent parts: hit positive bias, hit negative bias, false bias, and missed bias. And we use it to evaluate the performance of the three latest satellite precipitation products in the eastern monsoon region of China. Our study reveals 4CED has apparent improvements compared with the previous method. Results also provide new insights for tracking error sources and quantifying the error magnitudes of precipitation products. Moreover, the proposed 4CED can be extended to different spatial and temporal scales. Our new method will not only contribute to product upgrades, but also provide guidance for potential applications.

In this study, an alternative four-component error decomposition method (4CED) is proposed to better describe the error components of precipitation products. The proposed scheme of 4CED decomposes the total errors into four independent parts: hit positive bias, hit negative bias, missed bias, and false bias. Moreover, we apply it to three state-of-the-art satellite precipitation products (SPPs) to illustrate the feasibility of the method. This study has two objectives: (1) To illustrate the enhancement of the new method; (2) To provide new insights for identifying and tracking errors in precipitation products.

Methods
Traditionally, the statistical indicator (mean error, ME) is commonly used to quantify the bias of satellite precipitation. This indicator is obtained by summing the difference between SPPs and observations. Such an indicator is usually valid when comparing different SPPs, but there may be some cases where misleading results occur. For example, a dataset with a large daily bias (both positive and negative) may add up to a small bias when averaged over a long time. Conversely, a product with a small daily bias (only positive) may add up to a larger bias. More importantly, the fact that we only know a total bias (even with some other metrics, see Text S2 for more discussion) prevents us from tracking the sources of the error in depth and getting better feedback to improve the retrieval algorithms. Tian et al. (2009) decomposed the error of SPPs into three parts (hit bias, missed precipitation, and false precipitation) to better track the error sources associated with the satellite retrieval processes. However, hit bias, as one of the three error components in their method, is still a combined value obtained after offsetting positive and negative bias, and therefore does not accurately describe the magnitude of all individual errors. In this study, we extend it to a more comprehensive version called four-component error decomposition (4CED) to further decompose hit bias into a positive part and a negative part. Illustrations and an example can be found in Figure 1.
and we can further derive a binary-valued hit precipitation event mask as: Among four error components, HPB and FB are always positive, and HNB and MB are always negative. The relationships between them with mean error (ME) and mean absolute error (MAE) are: The MAE [0, ) accumulates the absolute difference between the SPPs and reference data for each day compared to the ME ( ,  ) which directly accumulates the difference. The closer MAE (or ME) is to 0, the smaller the total errors.
Through the intermediate steps in the above equations, we cannot only deduce the classical ME error decomposition, but also get many relevant statistical indicators, such as POD, FAR (see Text S1). Moreover, these equations can also be applied to other temporal scales (e.g., hourly).

Data
We apply our error composition scheme to three state-of-the-art satellite precipitation estimates products (SPPs) in this study, namely the Precipitation Estimate from Remotely Sensed Information using Artificial Neural Networks-Dynamic Infrared Rain Rate near-real-time (PDIR-Now, hereafter, PDIR) , Global Precipitation Measurement Mission Integrated Multi-satellitE Final Run V6B (hereafter, IMERG) (Huffman et al., 2015(Huffman et al., , 2019 and Gauge-calibrated Global Satellite Mapping of Near Real-time Precipitation product version 6 (hereafter, GSMaP) (Kubota et al., 2020). Their main differences lie in algorithm design, input data, parameter estimation, presence of bias correction. More detailed information can be found in Table S1 as well as in some earlier studies (Hsu et al., 1997;Hong et al., 2004;Nguyen et al., 2018Nguyen et al., , 2019. The China Meteorological Administration (CMA) observed gridded precipitation dataset with a 0.5° spatial and daily temporal resolution was used in this study as the reference. This dataset was developed by interpolating the high-quality precipitation observations from more than 2400 weather stations over China to 0.5° × 0.5° grid using the Global 30 Arc Second Elevation Data Set (GTOPO30) and the thin plate smooth spline (TPS) method. To date, this is probably the most accurate gridded precipitation product for China and has been also confirmed to best describing the spatiotemporal distribution of precipitation .
The study area is set as the eastern monsoon region of China because of the relatively high density of gauge distribution (see Figure S1; see also in figure 1b in Su et al., 2019). Seventeen years of precipitation data (January 1, 2003 to December 31, 2019) were used for the analysis. And the difference between SPPs and CMA was termed as precipitation bias.

Results
The results of our study are divided into three parts. Firstly, we randomly choose one summer (June, July, and August 2012, hereafter JJA12) as an example to highlight the improvement of our new method relative to the old one. In the second part, we show the spatially averaged multi-year average intra-annual variability of different error components for their frequency, magnitude and percentage, as well as a comparison with the classical method. And in the last part, we illustrate the uniqueness of the new method, on the basis of which the characteristics of the spatial (both magnitude and frequency) and seasonal (winter, summer and all seasons) dominant factors of the error components are detected.

The Enhancement of 4CED
As we mentioned in Section 2, in some cases, the two individual components (positive and negative bias) may cancel each other, resulting in a smaller total bias (ME). The three-component error decomposition method (Tian et al., 2009) was proposed to better understand the error characteristics hidden in the expression of the total error. However, the same situations will occur with the hit bias in the three-component error decomposition. The hit bias was not completely decomposed, because of the fact hit bias is calculated similarly to ME, with both positive and negative cases. Here, we selected JJA12 as a case to test these hypotheses and to illustrate the rationality of the four-component error composition.  (Figure 2d), respectively. For the magnitudes of the two forms of bias, the ME for all SSPs ranges from −200 to 200 mm. However, MAE can reach 1000 mm or even higher in some regions. Three datasets share similar spatial characteristics of ME, that is, underestimation in the source of the Yangtze River basin and overestimation in Most of the Pearl River basin. The MAE, on the other hand, exhibits different spatial patterns. The Hengduan Mountains are a hot spot for all SPPs. These differences can be further explained in Figure 2b, where we observe that all products share considerable similarities in their spatial distribution of bias (both underestimation and overestimation) in the Hengduan Mountains. Therefore, this confirms that some large biases may be canceled out resulting in a relatively small total bias. The Hengduan Mountains with high MAE for all SPPs is due to the synergistic effect of the overestimation and underestimation in the same region, but it is not ZHANG ET AL.  captured or underestimated by ME. These underestimations are prone to occur for regions with many factors influencing precipitation processes (e.g., coastal areas and high mountains) (Grose et al., 2019;Masson & Frei, 2014;Yao et al., 2016).
Similarly, by further decomposing the biases, especially the hit bias into positive and negative parts using our four-component approach, the sources of errors in SPPs and their magnitudes can be identified and quantified in greater depth. Comparing the FB with HPB, we can see that the FB dominated the precipitation overestimation in the summer of 2012, and this feature is present in all products. This corresponds to a relatively small magnitude of HPB, which only aggravates the total amounts of overestimation. On the other hand, MB and HNB show similar spatial patterns, with MB dominating the underestimation of precipitation in the summer of 2012. However, if we use the old method (Tian et al., 2009), we are only able to obtain HB in Figure 2d and MB and FB in Figure 2c, and cannot further compare and analyze the magnitudes and spatial characteristics of HNB and HPB. On the contrary, our method yields HB as well as its components HPB and HNB, which enables us to obtain a more comprehensive understanding of all error components of precipitation products. Therefore, the 4CED incorporates and outdoes the original error decomposition method (Tian et al., 2009).
In summary, Figure 2 reveals that only using the ME bias and three-component error decomposition may underestimate the magnitudes of errors and may miss some critical error components (HPB and HNB) and regions with relatively poor precipitation forecast skills, such as the Hengduan Mountains. In contrast, the use of the newly proposed method not only yields the results of ME and MAE, but also provides a more comprehensive description of the spatial distribution and magnitudes of the individual error components.

Temporal Characteristics of Error Components
To explore the variation of errors over time, especially the differences between seasons, we calculated the spatially averaged and multi-year average (2003-2019) intra-annual variation of the error components for three SPPs (Figure 3). A 31-day moving average was used for each time series to smooth and reduce visual cluttering.
For different event frequencies (Figure 3a), all SPPs were able to detect no rain events (CN) more accurately. At the same time, most of the products were able to achieve a high probability of rain/no rain recognition without considering hit bias. Especially in the winter season, the probability can reach 70% or even higher. However, when we analyze the error component in addition to correct negative events (CN), the old method (Tian et al., 2009) can only give the total number of hit events, which is the sum of two events (hit positive and hit negative events in 4CED) and certainly accounts for a higher percentage. But in fact, among the four error events, FB is the one that occurs most frequently. Meanwhile, we are also able to split the proportion of hit positive and negative events, respectively. In addition to correct negative events (CN), the most frequent events were false alarm precipitation. And there was also a proportion of missed precipitation, which occurred more frequently in summer. Figure 3b depicts the seasonal variation in the magnitudes of ME, HB, HPB, HNB, FB, and MB. With the 4CED, we can capture more error components (HPB and HNB) and recognize their interactions. This has been ignored in previous studies. It is easy to obtain a relatively small ME bias when the magnitudes of positive and negative values in the error components are about at the same amplitude, which may mislead us to think that a sufficiently accurate precipitation estimate is obtained, especially after averaging during a longterm period. But, especially in daily precipitation estimates, the bias of SPPs may still be very large, which makes it very unfavorable when we engage in daily planning-related applications. Similarly, a very small hit bias is obtained when the HPB and HNB are canceled out by each other. Using the three-components error decomposition, the hit bias is one of the final error components, so the HPB and HNB are missed and their magnitudes are unknown to us. Figure 3c depicts the seasonal variations in the contributions of different error components, from which the dominant factors can be discerned. For example, for all SPPs, the bias in summer is dominated by the HPB, with a contribution greater than 25%, leading to positive ME and HB in Figure 3b. On the contrary, for winter, the dominant factor is the MB, with a contribution close to 50%, resulting in negative ME bias. The relative contribution of the four components and their seasonal variations can yield new insights into sources of error in precipitation products, especially the positive and negative parts of the hit events. This has not been mentioned in previous studies (Tian et al., 2009). And a combined value (HB with positive and negative) cannot be used with MB and FB to calculate relative contributions. Recognizing these new error components helps algorithm developers and data producers to further correct and reduce errors of precipitation products.

Spatial Characteristics of Error Components
As we have emphasized above, hit bias (HB), like ME, is a combined value, and it could be positive or negative. Therefore, even if a three-component error decomposition method is used, it is not appropriate to compare HB with MB and FB to identify the dominant factor of the error source. With the 4CED, however, each component is thoroughly decomposed and their positive and negative signs are constant, so we can take their absolute values and compare them with each other to determine the true dominant factor of the error source. For example, to compare the spatial characteristics of error components for three SPPs during the whole study period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) and differences between seasons, we calculated the cumulative values ZHANG ET AL.    PDIR IMERG GSMaP appears that the magnitude of HPB for IMERG in the southeast coastal area is much larger than MB. For the summer season, the dominant factors are more diverse. Four different error components-dominated regions exist for all products, but they have different spatial distributions and coverage. Moreover, there exist some regions with low frequency but high magnitude. For example, for IMERG and GSMaP, the HPB dominated most regions for the frequency of error components, where are dominated by MB for the magnitudes of error components (e.g., SongLiao River basin). Consequently, the dominant factors for the frequency of error components and their spatial distribution in all seasons are largely consistent with the winter season, while that for the magnitude of error components are closer to the spatial patterns of the summer season. This is possibly caused by the higher proportion of heavy rainfall events in summer. The interaction of these error components presents an enhancement of the total bias in some cases, while in other cases they cancel each other out, resulting in a total bias that is, much smaller than individual components.
Once again we would like to express that, compared to the three-component decomposition method (Tian et al., 2009), our method is able to directly compare all the independent error components of the full decomposition and determine their dominance as well as the temporal and spatial characteristics of the occurrence.

Discussion and Conclusions
Identifying and quantifying error sources of precipitation products can not only provide references for future algorithm research, but also serve users to better understand the advantages and limitations of these datasets. In this study, we propose a new framework for error decomposition of precipitation products, which can cover not only all components obtained from a previous study (Tian et al., 2009), but also provides new insights into the error sources, especially the magnitude of the hit bias (both positive and negative). The new framework named Four-component error decomposition (4CED). The three state-of-the-art products we selected include the latest PDIR, IMERG, and GSMaP. The evaluation of these products also confirms the validity of our proposed framework and how it complements the previous method in tracking and quantifying error sources. Furthermore, although three satellite precipitation estimates products were selected in this study for a daily-scale case study in the eastern monsoon region of China, we believe that the 4CED could be applied to more precipitation products (including reanalysis and merge products), as well as to different spatial (from pixel to watershed and even global) and temporal (e.g., hourly) scales.
There are many statistical metrics in the existing literature to describe and assess the error of precipitation products. In this study, we illustrate the differences and connections between 4CED and two traditional metrics (ME and MAE), as well as the enhancement of 4CED compared to traditional error decomposition method. We also summarized a series of commonly used metrics in Table S2. These individual indicators can be used to describe only one or several aspects of the precipitation products. For example, correlation coefficient (R) can only describe the correlation between two series and cannot quantify the error between them. Similarly, probability of detection (POD) is a probabilistic indicator and is only used for the assessment of rain (or no-rain). It cannot quantitatively describe the error of the products. Mean error (ME) is a combination of positive and negative values that calculates only the average bias between two series. It describes the systematic bias between the two series, which leads to an underestimation of the magnitude of the precipitation error, especially on a specific day. Mean absolute error (MAE) measures the average magnitude of the errors without considering their direction. Root mean squared error (RMSE) is a quadratic scoring rule that also measures the average magnitude of the error. Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. Metrics like Nash-Sutcliffe efficiency (NSE) and Kling-Gupta efficiency (KGE) are integrated scoring rules that allow us to better compare the relative goodness of different products to each other, but they simultaneously lose the ability to track and analyze sources of error. Therefore, depending on the needs and the purpose of the application, end-users use multiple metrics in one case to more fully describe and characterize the different features of a product. However, the above metrics are usually only used to analyze the relative goodness of a product and require great care when interpreting the results (more discussion and examples see Text S2). In contrast, the error decomposition method proposed by Tian et al. (2009) or AghaKouchak et al. (2012) does not only evaluate the product, but provides a better analysis of the error sources for precipitation products. Therefore, it is important to highlight that the 4CED is a new member of this family of scoring rules, and we aim to use it to better analyze the error sources and improve the quality of precipitation products based on different error sources. The improved error decomposition scheme (Chaudhary & Dhanya, 2021) is a very similar with 4CED, which was simultaneously validated and emphasized in India. This study chose some different precipitation products and study area, which also proves the generality of the methodology. This study closely compared our method with previous method (Tian et al., 2009). More importantly, This study elaborated the differences and connections between traditional scoring metrics and 4CED. 4CED is applicable to the error analysis of all types of precipitation products, which allows the method to be disseminated more widely.
The reader should be reminded of the strategy used in our analysis of the different error components over time in Section 3.2. Because our study period covers a relatively long period, we averaged the multi-year data to obtain an intra-annual variation. However, there may be some year-to-year or interannual variation in the accuracy and error features of SPPs over time. And these features may reflect improvements in terms of updates to the algorithms, or changes in available data sources. In future studies, we can further partition the study period and compare the characteristics of the errors over different study periods (Shen et al., 2020;Tang et al., 2020). Spatial averaging poses similar problems, but our approach is generalized to different spatial and temporal scales. Another issue is that all comparisons in this study were performed at a daily and 0.5° × 0.5° grid due to the fixed resolution of the reference data. Therefore, the results presented may not fully demonstrate the advantages of some of the products with their original intentions. For example, PDIR is a product with the highest spatial resolution and the shortest lag time among all SPPs. And such a product may help us to focus on more regionalized precipitation features and near real-time applications, especially in regions with sparse stations. The same goes for the highest temporal resolution of the IMERG dataset. All these issues need to be further investigated in the future.