Short- to medium-range superensemble precipitation forecasts using satellite products: 2. Probabilistic forecasting



[1] Short- to medium-range probabilistic precipitation forecasts over the global tropics are explored using satellite products, from the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) and Special Sensor Microwave/Imager (SSM/I) instruments. In addition to the conventional probability of precipitation (POP) forecast, superensemble (SE) POP forecasts are introduced and applied to the multianalysis, multicumulus-scheme, and multimodel ensemble configurations in two different horizontal resolution forecasts. It is shown that an ensemble system using a single model has a more consistent bias, which can at least partially be removed by a simple bias correction. With the aid of properly prepared ensemble members, meaningful POP forecasts have much longer forecast lead times. Results also show that a family of higher-resolution forecasts has a greater ability in removing model biases. The advantage of the SE approach is found to be evident in making POP forecasts, compared to the conventional method. The skills of SE POP are 10 to 20 percent better than those of the bias-corrected.

1. Introduction

[2] It is well known that it is difficult to assess rain rates over the global tropics. However, the advent of satellite observing system and the advancement of retrieval algorithms have partially solved this problem. As an extension of the previous companion superensemble (SE) studies of deterministic precipitation forecasts (part 1 [Shin and Krishnamurti, 2003]), the probability of precipitation (POP) forecast will be investigated using satellite-based rain products. A probabilistic SE approach will be proposed in the present paper as well.

[3] While the ensemble mean represents a consensus deterministic forecast, the distribution of forecasts in an ensemble is most useful for generating probabilistic forecasts. Probabilistic predictions differ from the deterministic forecasts in that, depending on the expected likelihood of forecast events, they assign a probability value between 0 and 1, instead of exclusively using no (0) and yes (1) as forecast outcomes. Most forecasts are associated with uncertainty and the level of uncertainty is situation dependent. The use of probabilities allows the generator of a forecast to explicitly express the level of uncertainty associated with a given forecast.

[4] A probabilistic forecast is one that estimates the probability of occurrence of a chosen event E. The event type selected for this study is the “precipitation exceeding a pre-defined threshold level.” The probability of a forecast event from an ensemble system at a fixed point is based on the fraction of ensemble members predicting that event. For an ensemble of equally reliable models, the probability of the event E is (n/N) × 100%, where n is the number of ensemble members forecasting E, and N is the total number of ensemble forecasts.

[5] In recent years, there have been several studies in the field of probabilistic precipitation forecasts which extensively make use of ensemble prediction systems [e.g., Du et al., 1997; Hamill and Colucci, 1998; Eckel and Walters, 1998; Buizza et al., 1999; McBride and Ebert, 2000; Mullen and Buizza, 2001]. They all have shown that the ensemble forecast provides a more accurate rainfall forecast than a single model forecast.

[6] Utilizing the ensemble prediction system, Krishnamurti et al. [2000, 2001] made a cornerstone for precipitation forecasts from an optimal ensemble forecast produced by combining individual forecasts from a group of models with help of the pre-computed optimal statistics, that is, a SE forecast. They showed that the deterministic SE precipitation forecasts for days 1, 2, and 3 are invariably superior to various conventional forecasts. However, they did not consider the probabilistic aspects of their SE forecasts. This portion of SE forecasts will be covered in this study. Since the basic notion of the SE is that not all models are equally reliable in different points in space, it is to be expected that the probability associated with the resulting deterministic forecasts would not be the same for the SE and the regular ensemble. After suitably defining the corresponding probability from using the SE method, it is desirable to compare the relative qualities of the two probabilistic forecasts.

[7] The present paper is devoted to the application of the SE approach to POP forecasts. Section 2 deals with the various satellite rain rate algorithms and physical initialization. After discussing ensemble members devised in this study in section 3, probabilistic superensemble techniques are introduced in section 4. Results of the probabilistic precipitation forecasts are then analyzed in section 5. Conclusions follow in section 6.

2. Satellite Rain Rate Products and PI

[8] This section describes five different satellite-measured rain rate algorithms employed in the present study. Most of them are based on passive microwave techniques in their retrievals. These precipitation estimation schemes make use of the cloud-penetrating nature of microwave radiation which makes it possible to observe the vertical structure of rainfall system. One of these satellite-retrieved rain rates will serve as a benchmark rainfall for verification purposes. The model performances can be better evaluated based on accurate observations.

2.1. OLSON

[9] The algorithm recommended by W. S. Olson et al. (Recommended algorithms for the retrieval of rainfall rates in the tropics using SSM/I (DMSP-F8), unpublished manuscript, 1990) is based on a statistical regression procedure that makes use of the brightness temperatures from Special Sensor Microwave Imager (SSM/I). The SSM/I instrument, on board several Defense Meteorology Space Program (DMSP) satellites (F11, F13, F14 and F15) is a passive microwave radiometer with channel at four frequencies: 19.35, 22.235, 37.0 and 85.5 GHz. Of these, the 22.235 GHz channel senses only the vertical polarization, while the other channels are dual-polarized (vertical as well as horizontal). Currently, there are four DMSP satellites that provide these radiometer data sets. For simplicity, all of the subsequent referrals to the SSM/I channels will truncate the decimal and will refer to the vertical and horizontal polarizations as V and H, respectively. The algorithm uses an exponential form of a nonlinear retrieval equation using 85(H, V), 37(V), 19(V) and 22(V) GHz for precipitation over land and oceans. The upwelling microwave radiation in all these channels is first retrieved in the form of brightness temperatures (TB). The algorithm for the retrieval of the rain rate R, as a function of collocated SSM/I TB's is based on regressing the TB's against rainfall rates (in logarithmic form, ln(R + C), where C is a constant), in stepwise multiple linear regression.

[10] This resulted in a regression equation of the form:

equation image

and the following recommended algorithms for precipitation retrieval in the tropics by W. S. Olson et al. (unpublished manuscript, 1990) are used for land and oceans, respectively.

  1. For rainfall over the land
    equation image
  2. For rainfall over the oceans
    equation image

[11] In the case when the 85 GHz channels were unusable then

  1. For rainfall over the land
    equation image
  2. For rainfall over the oceans
    equation image

were used.

[12] As a part of the rain rate retrieval process, these relationships were determined using collocated SSM/I data and radar rainfall observations. It should be noted that the algorithm tends to underestimate the heavy rain events.


[13] The Ferraro rainfall retrieval algorithm [Ferraro and Marks, 1995] makes use of the DMSP SSM/I and is calibrated against high-quality ground-based radar measurements. This algorithm is also-called the NOAA/NESDIS (National Oceanic and Atmospheric Administration/National Environmental Satellite, Data, and Information Service) SSM/I algorithm. The Ferraro SSM/I rain algorithm includes both scattering and emission-based aspects in its design as follows:

  1. Scattering algorithm: The scattering technique is best suited for land-based precipitation systems. The motivation for using a scattering approach is to get global rain estimates over land and ocean. No other technique can be used to meet this objective. The scattering index (SI) is defined as
    equation image
    equation image
    where TB denotes brightness temperature (K) and the subscript indicates the SSM/I channel.
  2. Emission algorithms: Over the ocean, atmospheric liquid water Q can be retrieved using any of the SSM/I window channels. The frequency at 19 GHz is, however, best suited for measurement of liquid water since it is the least sensitive to scattering. An algorithm to retrieve Q at 19 GHz is
    equation image
    Here, the contributions due to water vapor are removed using the 22 GHz channel.

[14] Because of the difficulties in accurately matching the SSM/I and radar measurements in both time and space, a binned approach where both measurements are grouped in 1 mm h−1 rain-rate bins provides a much more accurate set of measurements to be used in the derivation of coefficients for instantaneous rain rate retrieval. The binned approach allows for the generation of representative SSM/I predictors as a function of rain rate for both land and ocean. Both linear and nonlinear relationships (rain rate retrieval equations) are developed, with the nonlinear fits being more accurate. Regression coefficients as a function of radar site, land and ocean are obtained using the binned data sets. The linear regression is in the form R = aSSMI + b, where SSMI represents both the SI and Q19, while the nonlinear is in the form R = aSIb for the scattering index, R = a exp (bQ19) for the cloud liquid water algorithm. The results of the best regression coefficients (a and b) are given by Ferraro and Marks [1995]. This algorithm is less sensitive to liquid water and more sensitive to ice particles aloft.

2.3. TMI2A12

[15] This is the TRMM Microwave Imager (TMI) 2A12 rainfall algorithm [Kummerow et al., 1996, 2000] that is supplemented by the NOAA/NESDIS SSM/I algorithm (see section 2.2) outside 35°S and 35°N. This rain rate will be regarded as our best estimate among the observed rain rates in the present research.

[16] TRMM is a joint U.S.-Japan satellite mission to monitor tropical and subtropical precipitation and to estimate its associated latent heating. TRMM was successfully launched on November 27, 1997, at 4:27 PM (EST) from the Tanegashima Space Center in Japan. The TRMM Microwave Imager (TMI) is a nine-channel passive microwave radiometer, which builds on the heritage of the SSM/I instrument flown aboard the DMSP platforms. The TMI frequencies are copies of those of the SSM/I except that channels at 10.65 GHz (dual polarization) are added and the water vapor channel moved from 22.234 to 21.3 GHz (vertical polarization only) to avoid saturation in the tropics. For a complete description of the TMI, the user should refer to Kummerow et al. [1998].

[17] The TMI 2A12 algorithm (also referred to as the TMI profiling algorithm) is a slightly modified GPROF (Goddard Profiling) algorithm applied to the TMI data. The GPROF 4.0 SSM/I algorithm is the state-of-the art rain rate algorithm that can be described as a physical profile algorithm. This algorithm, primarily designed with the operational goals of TRMM in mind, utilizes a computationally simple technique for retrieving the precipitation and vertical hydrometeor profiles from downward viewing radiometers.

[18] The profiling algorithm is based upon a Bayesian approach that begins by establishing a large database of potential hydrometeor profiles and their computed brightness temperatures (TB). This database is computed from nonhydrostatic cumulus-scale cloud models using explicit cloud microphysics such as the Goddard cumulus ensemble model. Once the database is established, the retrieval searches the database. In Bayes's formulation, the probability of a particular profile R, given TB can be written as:

equation image

where Pr(R) is the probability with which a certain profile R will be observed and Pr(TB∣R) is the probability of observing the brightness temperature vector, TB, given a particular rain profile R. The probability that a profile R will be observed is taken from the cloud profile database. The second term is specified in the Bayesian formulation to be the Gaussian weight which depends upon the root mean square difference between observed and computed TB. Moreover, detailed three-dimensional radiative transfer calculations are used to determine the upwelling TBs from the cloud model to establish the similarity of radiative signatures and thus the probability that a given profile is actually observed. A more complete description of this portion of the algorithm is given by Kummerow et al. [1996].


[19] F. J. Turk et al. (Blending coincident SSM/I, TRMM and infrared geostationary satellite data for an operational rainfall analysis, part I, Technique description, submitted to Weather and Forecasting, 2001) (hereinafter referred to as Turk et al., submitted manuscript, 2001) provide a blended SSMI and TMI rain rates at a high spatial resolution of 0.25 degree latitude/longitude global grid (between 60°S and 60°N) in a near-real time sense. The SSM/I rain rate is computed via the operational NOAA/NESDIS scheme used at the Navy Fleet Numerical Meteorology and Oceanography Center (FNMOC), which is computed at the A-scan sampling spacing (25 km) of the SSM/I scan operation. For TMI data, the real-time level 2A12 instantaneous rain rates are used directly [Kummerow et al., 2000]. The 2A12 data provide rain rates from the nine TMI channels using the GPROF algorithm, which is based upon a Bayesian approach [Kummerow et al., 1996]. The 2A12 products are provided on the 85 GHz scan grid of the TMI, where samples are spaced 7 km along and 4 km across the track.

2.5. TURK

[20] Turk et al. (submitted manuscript, 2001) developed an operationally-oriented combined rain rate retrieval algorithm (also referred to as the NRL (Naval Research Laboratory) technique) from the infrared (IR) brightness temperatures of the geostationary satellite (five current operational geostationary satellites: GOES-8 and 10, GMS-5, and Meteosat-5 and 7) and the microwave radiance data sets (the SSM/I and TMI data base).

[21] The blended geostationary IR and microwave-based rain rate estimation technique makes use of a dynamic adjustment (or calibration) of geostationary IR brightness temperature data from a real time flow of the physically-based SSM/I and TMI rain rates. Given the large number of SSM/I (F11, F13, F14, and F15) and TMI measurements, it is possible to collect a large database of temporally and spatially coincident geostationary IR pixels. The technique takes advantage of the physically-based, passive microwave-based rain rate estimates and the finer scale infrared-based update available from geostationary orbiting satellites. The adjustment of the IR data is based upon a statistical histogram-matching method between the IR brightness temperatures and the microwave-based rain rates using a probability-matching approach within overlapping 15 degree global regions. This is designed to provide a smooth transition from one pixel to the multihour rain accumulation, which is computed using an explicit time integration of successive images. The dynamic adjustment assures that the most recent microwave-based rain rate estimates are used to readjust the rain rates from the IR data.

[22] The retrieval of rain rates from the geostationary IR data, as implemented in a real-time fashion, involves a background process that is constantly examining newly arriving microwave and geostationary data sets. Computation of instantaneous rain rates involves consulting lookup tables produced via the background process, and then applying a series of screens. The NRL technique was designed for rapid-update global rain estimation, and with its reliance upon microwave-based satellite data, it inherently is tied to the accuracy of the SSM/I and TMI rainfall algorithms.

2.6. Physical Initialization

[23] One of the essential components of this study is physical initialization (PI) which is extensively used in the generation of the multianalysis ensemble configuration. PI assimilates the aforementioned satellite-based observed measures of rain rates into an atmospheric forecast model [Krishnamurti et al., 1991]. The procedure allows the surface fluxes of moisture, the vertical distribution of the humidity variable, the mass divergence, the convective heating, the apparent moisture sink (following Yanai et al. [1973]) and the surface pressure to experience a spin-up consistent with the model physics and imposed rain rates. This is accomplished through a number of reverse physical algorithms within the assimilation mode.

[24] A schematic diagram of the procedures involved in PI is shown in Figure 1. The methodology of PI involves a reverse similarity theory, a reverse cumulus parameterization, an outgoing longwave radiation (OLR) matching, and a Newtonian relaxation of the model variables. The brief description of the procedure of PI is as follows: (1) the diagnostic calculations of surface fluxes are made, (2) the humidity analysis is made consistent with the surface fluxes and observed rainfall rates, and (3) the humidity analysis is made consistent with the net OLR. In order to incorporate the above analyses into the model, a Newtonian relaxation method is applied to a pre-integration phase corresponding to day -1 to day 0.

Figure 1.

A schematic diagram of the procedure of the physical initialization.

[25] Figure 2 illustrates the previously described five different observed 24-h averaged rain rates on the left panels (a1-5) and their corresponding physically initialized rains on the right panels (b1-5). As shown in the left panels in this figure, each satellite algorithm produces quite a different retrieved rain rate. The correlations are approximately 0.5 to 0.9 each other. While the geographical locations of significant rain systems coincide quite well, the intensities are quite different among algorithms. There are good agreements between the observed and the physically initialized rainfalls, whose correlation ranges from 0.85 to 0.95 depending on the specific case and the satellite algorithms employed.

Figure 2.

Example maps of 24-hr averaged (a1-5) observed rainfall estimates (mm d−1) with different satellite rain rate algorithms and (b1-5) their corresponding physically initialized rainfall distributions.

3. Ensemble Members

[26] In this study, ensemble members of precipitation forecasts are constructed from three different categories of numerical experiments. These categories are multianalysis (MA), multicumulus-scheme (MC), and multimodel (MM) ensemble configurations.

[27] The retrieved rain rate products (see section 2) are assimilated into the model initial analyses through the FSU PI system in order to produce MA forecasts. The MA configuration includes the following six members: (1) CONTROL, (2) FERRARO, (3) OLSON, (4) SSMI/TMI, (5) TURK, and (6) TMI2A12.

[28] Owing to the presence of moisture in our atmosphere, cumulus convection is the most important physical parameterization to consider, particularly for the vertical distribution of heating, moistening, and the rain rates. There have been many efforts to properly parameterize cumulus clouds, given the information on the resolvable scale. In order to assess the impact of different cumulus parameterizations on the ensemble prediction system, six different state-of-the art convective schemes are incorporated into the FSUGSM in this study. These are (1) FSU (Kuo-type [Krishnamurti et al., 1983]), (2) NCEP/SAS (National Center for Environmental Prediction/Simplified Arakawa-Schubert [Pan and Wu, 1994]), (3) GSFC/RAS (Goddard Space Flight Center/Relaxed Arakawa-Schubert [Moorthi and Suarez, 1992]), (4) NRL/RAS (Naval Research Laboratory/Relaxed Arakawa-Schubert [Rosmond, 1992]), (5) NCAR/ZM (National Center for Atmospheric Research [Zhang and McFarlane, 1995]), and (6) EMANUEL ([Emanuel and Živković-Rothman, 1999]) schemes. The MC ensemble configuration is devised by using the above six members.

[29] The MM configuration is a set of independent NWP model precipitation forecasts. The participant centers and corresponding models are (1) BMRC/GASP (Bureau of Meteorology Research Centre/Global Assimilation and Spectral Prognosis or Global AnalysiS and Prediction), (2) JMA/GSM (Japan Meteorological Agency/Global Spectral Model), (3) NRL/NOGAPS (Naval Research Laboratory/Navy Operational Global Atmospheric Prediction System), (4) RPN/GEM (Recherche en Prévision Numérique/Global Environmental Multiscale model), (5) NCEP/AVN/MRF (National Center for Environmental Prediction/Aviation Medium-Range Forecast global model), and (6) FSUGSM. The reader is referred to the official documentation of the relevant operational centers for descriptions of the models.

4. Probabilistic Superensemble Technique

[30] By the same token as the deterministic SE approach (a companion paper, Shin and Krishnamurti [2003]), probabilistic SE forecasts can be formed as described below. The probability of an event E derived from the SE can be defined as

equation image


equation image

where Fi is a prediction by model i and the weights wi are normalized so that their sum is unity. For equally reliable models, wi = 1/N.

[31] A way of defining the weights associated with the different models making up the SE is to relate the weights to any kinds of skill scores (ai) of interest at each grid point.

equation image

Here α is a scale parameter. As an example, ai can be defined as the subtraction of the false alarm rate FARi from the hit rate HRi for the ith model over the training period.

equation image

where the HR and the FAR are defined from a 2 × 2 contingency table shown in Table 1 for forecasts of a binary event as follows:

equation image

The contingency table is a useful concept for verifying the occurrence of rainfall. In the table, H (hits) denotes the frequency of correctly predicted occurrences, F (false alarms) the frequency of forecasts where no rain occurred, M (misses) the frequency of rain occurrences that were incorrectly predicted, and R (non-events) the frequency of non-rain occurrences correctly predicted. When analyzing forecasts for the presence of rain in the correct location, it is desirable to obtain a high HR in conjunction with a low FAR. Empirically, the best choice for α is usually 0.5.

Table 1. A Contingency Table of Forecast Performancea
  • a

    Letters H, M, F, and R represent the total numbers of occurrences of each contingency in the verification sample.


[32] An alternative way of defining the weights is that ai in the equation (12) is defined as the Brier skill score at each point during the training period. This approach gives a different (better) weight to each model in terms of its corresponding skill score. The probabilistic SE forecast, therefore, depends chiefly on how to define ai.

[33] It must be noted that the weights wi are varying in space, that is, the different models have varying relative contributions to the total probability depending on the spatial location of the point in question.

[34] Another possible way to construct probabilistic SE forecasts can be achieved as follows; first, develop a set of deterministic SE forecasts from each model separately, which will give member SE forecasts, Si. We use these Si in place of Fi in the equation (11) in order to make a probability of precipitation (POP) forecast.

5. Results of Probabilistic Forecasts

[35] This section investigates the characteristics of probabilistic precipitation forecasts from each configuration of the multianalysis (MA), multicumulus-scheme (MC), and multimodel (MM), as well as all together (ALL). The impact of the SE approach on precipitation forecasts is also extensively explored in probabilistic terms. The Brier skill scores (BSSs) and relative operating characteristic (ROC) areas are the two main verification measures employed for this system of probabilistic forecasts.

[36] A standard measure of the skill of probabilistic forecasts, similar to the RMSE used for assessing the skill of deterministic forecasts, is the Brier score (B) described as:

equation image

where n denotes the number of forecast/event pairs. The B is essentially the mean square error for probability forecasts of an event, where fi and oi are forecast and observed probabilities respectively; oi takes values of 1 when the event occurs and 0 when it does not. Following the convention, comparison of the skill of different forecasts is normally done in terms of a skill score. For any verification diagnostic, X, the skill of a forecast relative to some reference forecast is given by

equation image

where Xf is the value of X for the forecast, Xr for the reference forecast and Xp for a perfect deterministic forecast. A skill score has a maximum value of 100% for a perfect forecast (Xf = Xp) and a value of zero for performance equal to that of the reference (Xf = Xr). S has no lower limit, with negative values representing poorer skill than the reference. Brier skill scores (BSS) are calculated by using B as X in equation (16). The BSS evaluates the improvement of a probabilistic forecast relative to a forecast of climatological probability. This measure has been served as the primary basis for verification of probability of precipitation (POP) forecasts. This study uses the sample climatology as its reference forecast.

[37] The relative operating characteristic (ROC) curve is another measure of skill which is frequently used for probabilistic forecasting. The ROC measures the skill of a forecast in terms of a HR and a FAR (see equation (14)) both classified according to the observations. Since it is classified by observations, ROC measures the ability of a forecast system to discriminate between two alternative outcomes. The ROC curve is plotted by computing HRs and FARs for a range of probability thresholds of an event, where the event is forecast to occur if the probability exceeds the threshold, and then plotting HR against FAR. The curve ranges from (0, 0) to (1, 1). If HR = 1 and FAR = 0, it indicates the most skillful forecast. Meanwhile, the forecast system has no skill if the ROC curve lies along the line HR = FAR. The area under the ROC curve implies an overall skill measure, with a maximum value of 1.0 for a perfect forecast and a value of 0.5 indicating no skill, for the ensemble prediction system.

5.1. Results From T126 Forecasts

[38] We first explore the probabilistic rainfall forecasts from the T126 ensemble forecast system. The different systems are compared over a large sample of 137 cases covering the time period April 1 to August 15, 2000.

[39] Table 2 shows BSSs relative to the sample climatology for different levels of precipitation threshold verified for projections of 1–6 days over the global tropics (45°S–45°N), where no bias corrections are applied. To ensure greater stability, statistics are computed using all forecasts pooled in space and time, rather than averaging the daily values. In the calculation of the BSS, a 2.5° latitude/longitude grid averaging is first applied to all models in order to compensate for the deficiency of phase error in the NWP models. Here, no attempt was made yet to give greater weight to models with greater skill. The POP was estimated at each grid as the proportion of model QPFs predicting rain at or above a given threshold.

Table 2. BSSs Relative to the Sample Climatology for Different Levels of Precipitation Threshold During Days 1 to 6 T126 Forecasts Over the Global Tropics (45°S to 45°N), No Bias Correction, April 1 to August 15, 2000
Threshold, mm/dEnsemble ConfigurationForecast Days

[40] As the forecast lead time or threshold increases, the skill scores generally degrade for every configuration. While the lowest skill scores are shown by the MA configuration, the highest skill scores are coming from the MM configuration. Compared to climatological forecasts, day 5 MM forecasts with the threshold greater than 10 mm d−1 exhibit 21% higher skill scores. This implies that with proper ensemble members, therefore, meaningful precipitation forecasts can have much longer forecast lead times.

[41] Individually bias-corrected probabilistic skills of ensemble rainfall forecast are shown in Table 3. The format is the same as that of Table 2. Overall skill scores are increased compared to those without bias correction. Significant increases in skill scores are achieved in the MA and the MC configurations, which implies that there exist large model biases in these configurations. Especially for lower thresholds (up to 10 mm d−1), the simple bias correction provides a drastic improvement in the forecasts. The bias correction in the MM configuration has a slightly detrimental effect on the skill scores for the 2 mm d−1 threshold. While the skill scores of the MM configuration are greater when the rain thresholds are lower, the ALL configuration shows higher skills at higher thresholds.

Table 3. Same as Table 2 but for Bias Correction
Threshold, mm/dEnsemble ConfigurationForecast Days

[42] Generally, the MM is consistently the more skillful configuration, compared to the MA and MC. The MA is generally less skillful than the other two systems. The benefit of the MM over the MA for precipitation forecasts can be attributed to the combination of different models. However, it was shown that application of a simple bias correction to the ensemble members reduces that advantage of the MM ensemble system. The fact that the bias correction improves the skill scores of MA and MC, but not MM suggests that an ensemble using a single model has more consistent bias, which can at least partially be removed.

[43] Figure 3a illustrates maps of POP for 10 mm d−1 of precipitation threshold for day 5 MM T126 forecast with bias correction over the global tropics. The associated event (>10 mm d−1) occurred is shown in the middle panel. The overlapping of these two panels is shown in Figure 3c. There is a strong correspondence with each other, whose BSS is 24.95. This result clearly indicates that the medium-range POP forecast contains a fair amount of skill with the aid of properly prepared ensemble members.

Figure 3.

Maps of (a) POP, (b) event occurred, and (c) overlapping of Figures 3a and 3b for 10 mm d−1 of precipitation threshold for day 5 MM T126 forecast over the global tropics with bias correction (BSS = 24.95). Note regions of event occurred are masked with white in Figure 3c.

[44] Overlapping maps (same format as for Figure 3c) for the MA (panel a), the MC (panel b), and the MM (panel c) configurations are shown in Figure 4. These are day 3 T126 probabilistic forecasts for 10 mm d−1 of precipitation threshold after bias correction. There is more disagreement between the POP and event occurred in the MA and MC than in the MM, especially over the Indian ocean and the Gulf of Mexico, which result in less skillful POPs. Although the skill for the MA configuration is less than zero, there exists a good agreement between the POP forecast and observation. It is hard to say, therefore, that random forecasts based on observational uncertainty are better than the MA ensemble forecast system. At the least, state-of-the-art numerical models predict reasonable signals for precipitation events. But, they have inherent deficiencies in making forecasts of the proper magnitude and location of those events.

Figure 4.

Maps of overlapping of the POP and event occurred for (a) MA, (b) MC, and (c) MM configurations (same as Figure 3c). These are day 3 T126 forecasts for 10 mm d−1 of precipitation threshold with bias correction.

[45] Table 4 summarizes ROC areas for different levels of precipitation threshold during days 1 to 6 for T126 forecasts over the global tropics for the MA, MC, MM, and ALL ensemble configurations. Bias corrections are applied and the verification period ranges from April 1 to August 15, 2000. In order to get a climatologically reliable reference forecast, ROC areas are computed using observations in random temporal order. Those values for individual thresholds are 0.58, 0.56, 0.54, 0.52, and 0.51 for 2, 5, 10, 25, and 50 mm d−1, respectively. If the ROC area is larger than each of these values, it implies that there is some skill in the ensemble prediction. Relatively high skills are obtained for every configuration and threshold. The explanation for these high skill scores is that since the FARs in the precipitation forecast over the globe are small, the overall ROC areas are usually overestimated, which is evident from ROC curves for precipitation forecasts shown in Figure 5. False alarms are low because the models typically do not generate rain in climatologically unfavored areas, e.g., deserts.

Figure 5.

Examples of ROC curves for day 3 T126 precipitation forecasts with respect to the precipitation threshold greater than 5 mm d−1 over the global tropics, using (a) MA, (b) MC, (c) MM, and (d) ALL configurations.

Table 4. ROC Areas for Different Levels of Threshold up to 6 day T126 Precipitation Forecasts Over the Global Tropics (With Bias Correction, April 1 to August 15, 2000)
Threshold, mm/dEnsemble ConfigurationForecast Days

[46] The curve is plotted using the HR versus the FAR. For a given probability of precipitation P, all forecasts with probability less than P are regarded as non-event forecasts, while all forecasts with probability greater than or equal to P are considered forecasts of events. The HR and FAR are computed from the contingency table (Table 1) for that value of P. This process is repeated for values of P between 0 and 1 to produce the ROC curve.

[47] The FARs of MA are much higher than those of MM configuration, which is evident from the POP forecasts (Figure 4) and ROC curves in Figure 5. The MM ensemble configuration shows consistently higher HRs and lower FARs than those of MA or MC configuration, and showing the largest area under ROC curves. The ROC areas of ALL configuration are similar to those of MM configuration.

5.2. Impacts of Higher-Resolution (T170) Forecasts

[48] Probabilistic precipitation forecasts at a higher resolution are explored here by repeating the procedures above used for the T126 forecasts. We will examine here how much improvement can be achieved only by increasing the resolution of the numerical models. The ensemble configurations used for T170 probabilistic precipitation forecasts are the same as those of deterministic forecasts (see Table 2 of Shin and Krishnamurti [2003]).

[49] Tables 5 and 6 show the BSSs for T170 ensemble configurations for different threshold values without and with bias correction, respectively. The format is the same as that of T126. Individual forecast skills are substantially improved by higher-resolution models, especially for MA and MC ensemble members, compared to those of T126. Due to this fact, the BSSs without bias correction (Table 5) are much higher than those of T126 (Table 2) in the MA and MM configurations. Other properties are similar to those of T126. The slight skill improvement in the MM configuration relative to T126 is due to one control FSU T170 forecast, and its improvement contributes to MM BSS.

Table 5. Same as Table 2 but for T170 Forecasts
Threshold, mm/dEnsemble ConfigurationForecast Days
Table 6. Same as Table 3 but for T170 Forecasts
Threshold, mm/dEnsemble ConfigurationForecast Days

[50] Although overall skill scores are increased relative to those without bias correction, there are not significant increases in skill scores in the MA and the MM configurations through bias correction (Table 6), unlike the T126 case. This implies that higher-resolution forecasts have a greater ability in removing model biases. Bias removed skill scores are sometimes less than bias included skill scores, similar to the case of T126 (e.g., MM configuration for >2 mm d−1).

[51] It is interesting to examine individual BSSs and the accumulated BSSs in order to determine the sensitivity of the ensemble forecasts to the number of ensemble members. These results are shown in Table 7. The Brier score of an individual model is worse than that of the sample climatology. That is, a climatological precipitation forecast is better than an individual NWP forecast in a statistical sense. However, as we use an increasing number of forecasts, the skill gradually becomes better than the random forecast. The accumulated BSSs show peaks when only 8 members are included in an intelligent manner, that is, by starting with the most skilled models in the both T126 and T170 ensemble forecasts. As shown in the table, the addition of more ensemble members with lower skill can degrade the collective ensemble forecasts.

Table 7a. BSSs Relative to the Sample Climatology for 10 mm d−1 of Threshold for Day 3 Precipitation Forecast Over the Global Tropics (With Bias Correction, April 1 to August 15, 2000): Individual BSSs
Ensemble MemberIndividual
Table 7b. Same as Table 7a but for Accumulated BSSs (From Best to Worst)
Number of MembersAccumulated

[52] Just as for the T126 runs, ROC areas for the T170 are computed and shown in Table 8. Compared to those of T126, a higher-resolution impact is not evident in the skill scores in every configuration. This occurs because the FARs are substantially decreased due to reduced bias in the T170 MA and MC forecasts, compared to those of T126, but the HRs are also decreased to compensate for the skill gained in FARs. Hence the final ROC areas of the T170 are similar to those of the T126. While the MM configuration shows the best ROC areas for lower thresholds, the ALL ensemble configuration shows the best ROC areas for higher thresholds. The ROC curves are not shown here because they are similar to those of the T126 except that the shape of curves are more elliptical and closer to the FAR axis.

Table 8. Same as Table 4 but for T170 Forecasts
Threshold, mm/dEnsemble ConfigurationForecast Days

5.3. Superensemble Probabilistic Forecasts

[53] Following the basic idea of defining the SE probability of an event discussed in section 4, we will investigate the probabilistic SE approach applied to the ensemble configurations employed in the study to explore its impact on probabilistic precipitation forecasts, and compare the results to the conventional approaches shown in the previous two subsections.

[54] Probabilistic skills of the T126 SE precipitation forecasts are shown in terms of BSS in Table 9. The values in the table indicate the SE BSSs relative to the bias-corrected Brier score, not relative to that of the sample climatology. The BSSs are organized with respect to the different levels of the precipitation threshold and ensemble configurations during days 1 to 6.

Table 9. Same as Table 2 but for Superensemble BSSs Relative to the Bias-Corrected Brier Scores (August 1 to August 15, 2000)
Threshold, mm/dEnsemble ConfigurationForecast Days

[55] SE statistics (i.e., weights) for probabilistic forecasts are computed during the training period (April 1 to July 31, 2000). In Figure 6, an example of those weight maps is displayed. It is evident from this figure that each model has its own superior regime. For example, model number 2 shows its higher reliability over the North Africa and Saudi Arabia. Using those statistics, probabilistic SE precipitation forecasts are made during August 1 to August 15, 2000. Excluding the 25 mm d−1 threshold, every skill score is 10 to 20 percent better than the bias-corrected one. The SE approach turns out to be more beneficial for the MA and the MC configurations.

Figure 6.

Weight maps for each model (0.0 to 1.0) developed in the probabilistic superensemble approach within the MM configuration.

[56] As an illustration for probabilistic SE precipitation forecasts, Figure 7 shows example maps of POP from a day 3 bias-corrected (BC) and a day 3 SE forecasts compared to the map of events occurred valid for August 16, 2000. The SE approach produces far more organized POPs, where the patterns look similar to the observed event. The BSS of the SE forecast with respect to the bias-corrected forecast is approximately 15% in this example. This result reflects a major improvement compared to what we have seen from the conventional methods of probabilistic forecasts.

Figure 7.

Maps of (a) POP from a bias-corrected (BC) forecast, (b) POP from a superensemble (SE) forecast, (c) events occurred, (d) overlapping of Figures 7a and 7c, and (e) overlapping of Figures 7b and 7c for 5 mm d−1 of precipitation threshold for day 3 MM T126 forecast over the global tropics.

[57] The superiority of SE probabilistic forecasts over the conventional ones is also evident in terms of the ROC areas. Forecasting through the SE technique produces better overall performance in ROC areas, than in bias-corrected ensemble forecasting (Table 4), particularly for lower precipitation thresholds (Table 10). Improvements on the ROC areas through the SE technique are more clearly shown in Figure 8. SE ROC curves are located on the left side of the bias-corrected ones indicating larger areas below those curves. The MA, MC, and ALL ensemble configuration all show clear improvements of ROC areas. Meanwhile, the MM configuration shows the least improvement of the ROC areas. The resulting ROC areas of MM and ALL configuration are almost equivalent.

Figure 8.

Superensemble ROC curves are compared to those of bias-corrected (BC) forecasts for day 3 T126 precipitation forecasts with respect to the precipitation threshold greater than 5 mm d−1 over the global tropics, using (a) the MA, (b) MC, (c) MM, and (d) ALL configurations.

Table 10. Same as Table 4 but for Superensemble ROC Areas (August 1 to August 15, 2000)
Threshold, mm/dEnsemble ConfigurationForecast Days

[58] Since probabilistic SE precipitation forecasts with T170 models provided similar conclusions, those are not shown in this paper.

6. Conclusions

[59] This study explored the conventional and the superensemble (SE) probabilistic precipitation forecasts in two families of different resolution forecasts using several ensemble configurations devised herein. Those configurations were the multianalysis (MA), the multicumulus-scheme (MC), and the multimodel (MM), as well as all together (ALL). One of satellite-measured rain rates was treated as our benchmark observation and used in forecast verifications. The Brier skill scores (BSSs) and relative operating characteristic (ROC) areas were the two main verification measures employed for the above system of probabilistic forecasts.

[60] It was shown that as the forecast lead time or threshold increases, skill scores of probability of precipitation (POP) forecasts usually degrade for every configuration. The MM is consistently the more skillful configuration, compared to the MA and MC. The MA is less skillful than the other two systems. A simple bias correction provides an appreciable improvement not in the MM but in the MA and MC T126 forecasts, which implies that there exist large model biases in the latter two ensemble configurations. This fact suggested that an ensemble system using a single model has a more consistent bias, which can at least be partially removed. With the help of a properly organized ensemble system (such as the MM), POP forecasts show much better medium-range forecast abilities. The MM ensemble configuration shows consistently higher hit rates and lower false alarm rates than those of MA or MC configuration, and showing the largest area under ROC curves.

[61] Similar conclusions as those noted above were drawn in the higher resolution POP forecasts. It was noted, however, that a family of higher-resolution forecasts usually has a greater ability in removing model biases. The best BSS showed peaks when only 8 members are included in POP forecasts.

[62] Probabilistic SE forecasts were made by giving greater weight to models with greater skill. The weights were independently computed at each grid point during the training period. It was shown that SE forecasts were 10 to 20 percent better than the bias-corrected forecasts in most of the configurations and threshold values. The SE approach turned out to be more beneficial for the MA and the MC configurations. The SE approach produced far more organized POPs, where the patterns look much similar to the actual observed event.

[63] As mentioned in the part 1 [Shin and Krishnamurti, 2003], the Global Precipitation Measurement (GPM) mission is expected to provide better provisions for rainfall observation, hence provide better guidance to POP forecasts in 3-h time intervals.


[64] This work could not have been completed without the assistance of the TRMM/TSDIS data centers. We are especially thankful to the modeling groups from BMRC, JMA, NCEP, NRL, and RPN for providing the global data sets. We wish to acknowledge the ECMWF (Tony Hollingsworth) for providing the base data sets for this study. The authors' sincere appreciation is owed to Steven D. Cocke for his many fruitful discussions on convective schemes, parallel programming of numerical models, and ensemble prediction systems. This work was supported by the following grants: NASA: NAG5-4729 and NAG5-9662; NSF: ATM-9710336 and ATM-9910526; NOAA: NA86GP0031 and NA77WA0571; and FSU Research Foundation Cornerstone Award. Computational support was provided by ACNS at the FSU.