Transfer of satellite rainfall error from gaged to ungaged locations: How realistic will it be for the Global Precipitation Mission?



[1] In this study, we investigate the fundamental open question facing the satellite rainfall data community today - If “error” is defined on the basis of independent ground validation (GV) rainfall data, how are these error metrics estimated for a satellite rainfall data product without the need for much extensive GV data? Using a six-year database of high resolution (0.25 degree and 3 hourly) satellite rainfall data over the United States and an optimal spatial interpolation method (ordinary kriging), we demonstrate that most error metrics (such as bias and probability of detection) are amenable for ‘transfer’ from gaged to ungaged locations than others. Our findings also indicate that a continuously-calibrated and regionalized error transfer scheme is technically feasible within the neighborhood of a gaged region if more research is carried out on the role played by different interpolation methods and the temporal structure of error.

1. Introduction

[2] NASA's planned Global Precipitation Measurement (GPM) mission, in collaboration with other international space partners, will represent a unique constellation of rain measuring satellites comprising passive microwave (PMW) sensors, augmented by a Tropical Rainfall Measuring Mission (TRMM)-like dual-frequency precipitation radar (DPR) [Hou et al., 2008]. GPM is currently scheduled for launch in 2013 (source: and it will provide high resolution global precipitation products (i.e., snow and rainfall) with temporal sampling rates ranging from three to six hours and spatial resolution of 25–100 km2. Hence, among the various uses, hydrologic application over land will comprise a major avenue through which GPM will be able to demonstrate tangible benefits to society. In particular, the global nature of coherent and more accurate satellite precipitation products (from PMW sensors [see Turk and Miller, 2005]) anticipated from GPM should offer hydrologists tremendous opportunities to improve water resources monitoring in large river basins where rainfall (hereafter used synonymously with ‘precipitation’) is abundant but in situ measurement networks are generally inadequate or declining [Shiklomanov et al., 2002].

[3] While the benefits from GPM are conceptually apparent, hydrologists and other users, to varying degrees, need to know the errors of the satellite rainfall data sets across the range of time/space scales over the whole domain of the data set prior to real-world applications [Hossain and Huffman, 2008]. Representing the error structure of satellite rainfall against quality-controlled ground validation (GV) precipitation datasets is therefore a critical research problem. Recent work has shown that the error structure of satellite precipitation estimates is increasingly complex at smaller scales at which data is now becoming more available [Hossain and Huffman, 2008; Ebert, 2008].

[4] Hence, the error of satellite rainfall data represents a paradox that has remained unresolved until today. Satellite rainfall error estimation requires GV rainfall data. On the other hand, satellite data will be most useful over the vast ungaged regions that are lacking in GV data. Depending on how we define GV data, there can be several types of GV ‘voids’ where error information will be difficult to be estimated. For example, if we rely on the ‘conventional’ ground source for GV data, voids will be represented by large regions having little or no instrumentation. On the other hand, if a ‘proxy’ for GV is defined, such as the TRMM PR or the proposed GPM DPR, then voids will be numerous grid boxes changing in location with the time-varying satellite overpasses. We are therefore faced with the following unanswered question for GPM- if “error” is defined on the basis of GV data, then how are these error metrics estimated for a global data product without the need for extensive GV data?

[5] A middle ground to resolve the above paradox could be to extract error information from a sensor of the highest accuracy currently in orbit (such as the TRMM-like PR on board the GPM) or from nearby sparsely-gaged regions and devise calibrated statistical methods for ‘transfer’ of this error information to the neighboring ungaged regions (see Figure 1 for a conceptual rendition). However, the ‘transfer’ of error information from gaged to ungaged location is clearly an untested idea that needs to be assessed if the benefit of GPM is to be maximized. In this study, our goal is to identify the level to which error can be ‘transferred’ from a gaged (GV) location to a nearby ungaged (non-GV) location. If the idea is found realistic, then the work already accomplished on global classification of precipitation systems [Petersen and Rutledge, 2002] will consequently hold promise for development of a real-time and regionalized error metric scheme for GPM products and their users.

Figure 1.

Conceptual rendition of the idea of ‘transfer’ of error information from a gaged (GV) location to an ungaged (non-GV) location. (top) Notion of ‘error’ of satellite rainfall data (in this case, the scalar deviation of magnitudes is termed ‘error’ although there are many other types of error). (bottom) How the known error (derived from GV sites shown (middle) in black) would be ‘transferred’ to the non-GV (ungaged) sites shown (right) in blue.

2. Study Region, Data, and Spatial Interpolation Method

[6] The study region for testing our idea of error ‘transfer’ was the Central United States (US). The geolocation of the four corners of this region are provided in Table 1. Hereafter, the word ‘transfer’ will be frequently interchanged with ‘spatial interpolation’. In order to minimize the error of the GV data in our investigation, we used the National Center for Environmental Prediction's (NCEP) 4 km Stage IV NEXRAD rainfall data that is adjusted to gages over the US [Fulton et al., 1998; Y. Lin and K. Mitchell, The NCEP Stage II/IV hourly precipitation analyses: Development and applications, paper presented at the 19th AMS Conference on Hydrology, American Meteorological Society, San Diego, California, 2005]. NASA's near real-time satellite rainfall data-products from PMW calibrated Infrared (IR) and merged PMW-IR estimates and labeled as 3B41RT and 3B42RT, respectively, were used as the satellite rainfall data [Huffman et al., 2007]. These are globally available on a near real-time basis at 0.25 degree and 1–3 hourly resolution from the world wide web (see The data for GV and satellite rainfall data spanned the period of 2002–2007 (6 years). A point to note is that there also exists research-grade satellite product 3B42 (V6) that is produced by NASA retrospectively by adjusting the bias using gage rainfall. Although the research grade product of 3B42 (V6) is known to have lower levels of uncertainty, this study focused on the testing the concept of transfer in the operational mode using real-time (RT) products.

Table 1. Geolocation of the Four Corners of the Study Region Shown in Figure 3
 Longitude (West)Latitude (North)
Upper left corner−104.543.5
Upper right corner−88.2543.5
Lower left corner−104.533.5
Lower right corner−88.533.5

[7] The method of ordinary kriging (OK) was used for testing the ‘transfer’ of error metrics from a gaged to an ungaged location. Ordinary kriging is one of the most common spatial interpolation estimator equation image(x0) used to find the best linear unbiased estimate of a second-order stationary random field with an unknown constant mean as follows:

equation image

where equation image(x0) = kriging estimate at location x0; Z(xi) = sampled value at location xi; and λi = weighting factor for Z(xi). For further details on the method of OK, the reader is referred to Deutsch and Journel [1992].

3. Methodology

[8] The NEXRAD Stage IV GV rainfall data was first remapped to 0.25 degree 3 hourly resolution for consistency with the native scale of the satellite rainfall products. Four widely-used error metrics were then computed for 3B41RT and 3B42RT products over the 6 year period to derive a relatively stationary spatial field of ‘climatologic’ error metrics for the study region. These metrics were: Bias (BIAS), Root Mean Squared Error (RMSE), Probability of Detection (POD) and False Alarm Ratio (FAR). The reader is referred to Ebert et al. [2007] for the formulation of these error metrics.

[9] Spatial correlograms for each error metric were derived and the correlation length (CL), where the autocorrelation dropped to 1/e (e-folding distance), was then computed. Next, the empirical semi-variograms were derived and then idealized as exponential semi-variogram functions prior to the kriging interpolation as follows,

equation image

where γ(h) is the semi-variance at spatial lag ‘h’, c0 represents the nugget variance (i.e., the minimum variability observed or the ‘noise’ level at the smallest separating distance equals 0; c is the sill variance – when spatial lag is infinite; and a is the correlation length. Figure 2 provides a summary of the ‘climatologic’ correlation length (e-folding distance) by season for various error metrics of the satellite rainfall products.

Figure 2.

Correlation length of error metrics for (top) 3B41RT and (bottom) 3B42RT shown as a function of season. Note the distance unit is 0.25 degree grid boxes (∼25 km). The vertical bars are shown in order from left to right as ‘Bias’, ‘RMSE’, ‘POD rain’, ‘POD no-rain’, ‘FAR’.

[10] Assuming that only 50% of the region was gaged (having access to GV data), kriging was implemented to estimate error metrics at the other 50% of the ungaged region (lacking in GV data; see Figure 1). This is analogous to a data withholding exercise using the dependent data. Selection of gaged grid boxes was random and hence each kriging realization was repeated 10 times in a Monte Carlo (MC) fashion to derive an average scenario of the ensemble. The semi-variogram and correlation length were computed on the basis of the 50% of the assumed ‘available’ data. To keep the matrix computations of kriging efficient, spatial interpolation was performed using a smaller square-sized ‘window’ around the ungaged grid box in place of the entire collection of gaged grid boxes in the whole region. The sides of this square window were equal to the correlation length of the error metric being ‘transferred’. Preliminary analyses showed that such a moving window based kriging was justified as the interpolation weights λi (equation (1)) due to grid boxes farther than one correlation length were found to be zero.

4. Results

[11] Figure 3 shows the performance of kriging at non-GV grid boxes for the BIAS of 3B41RT. It appears that the transfer of bias via kriging does not lead to wholesale changes in the pattern of the error field when compared to the true climatologic error field (see Figure 3, left). However, a more rigorous assessment can be obtained through the comparison of the histograms (probability distribution) of kriging error with the marginal distribution of the kriging estimate. Herein, the kriging error is defined as the scalar difference between the kriged error metric and the true error metric. If indeed the transfer or error metric is robust then the kriging error distribution should have a near-zero mean (for unbiasedness) and a lower spread (minimum error variance) compared to the marginal distribution of the estimated error. Figures 4 and 5 show the comparison of the histograms for 3B41RT and 3B41RT for BIAS and POD, respectively. It is seen that for BIAS and POD, the error histogram due to kriging has smaller variance compared to the marginal histogram of kriging estimates. Table 2 summarizes the correlation between kriging estimated error and true value of error for different error metrics.

Figure 3.

Transfer of BIAS of 3B41RT from gaged to ungaged locations. (top left) True field of error on bias based on 6 years of data. (bottom left) The randomly selected 50% of the region for computation of the empirical variogram and correlation length. (bottom middle) The other 50% of the region that is assumed to be non-GV grid boxes. (bottom right) The estimation of the bias at the non-GV grid boxes using kriging.

Figure 4.

Comparison of histograms of kriging errors and kriging values for 3B41RT (top) BIAS and (bottom) POD.

Figure 5.

Comparison of histograms of kriging errors and kriging values for 3B42RT (top) BIAS and (bottom) POD.

Table 2. Correlation Between Kriged Estimate of an Error Metric and the True Climatologic Value
Error MetricsBiasRMSEPODFAR

[12] As a preliminary analysis, the use of an optimal spatial interpolation method, such as ordinary kriging, for the transfer of error metrics appears promising at ungaged locations. All four error metrics were found amenable to transfer. Across satellite data products, kriging appears moderately more effective for the IR-based 3B41RT than the multi-sensor PMW-IR-based 3B42RT. This is not unexpected because of the lower correlation length and spatial dependency of error metrics for 3B4R2T. The grid boxes pertaining to non-PMW overpasses for the 3B42RT product are essentially supplied from the 3B41RT product. This simple style of mosaicing a dataset from two different spatial random fields, while improving the quality of rainfall estimate in terms of bias and RMSE, actually lowers the spatial structure by adding more spatial randomness to the data.

5. Discussion

[13] Overall, our assessment indicates that it is indeed technically possible to transfer error metrics from a gaged to an ungaged location for certain error metrics and that a regionalized error metric scheme for GPM may one day be possible. However, our work has also opened a much wider range of issues that require research before such a system can be implemented for GPM. First, the choice of randomly selected 50% of grid boxes may be somewhat unrealistic during the GPM era. Such a randomly selected combination of grid boxes is perhaps realistic if the use of the orbiting GPM PR is considered as the only source for GV data for the transfer of error metrics. The role played by the fraction of a region missing in GV data on the effectiveness of transfer or error also needs to be investigated. Another aspect that needs to be studied is the assumption of stationarity of error metrics that is critical for kriging. If a system is desired that can routinely provide an estimate of time-varying error metrics at ungaged locations in lieu of ‘climatologic’ values for a region, then the temporal structure of errors would need to be analyzed first.


[14] The first author (Tang) was supported by the NASA Earth System Science Fellowship (2008–2011). The second author (Hossain) was supported by the NASA New Investigator Program Award (NNX08AR32G).