In this OSE study, the model runs are verified against observations only. The verification is carried out using GPS ZTD data, radiosonde data at standard pressure levels and surface-precipitation data.
5.3. Impact on surface parameters
Next, the verification scores of categorical forecasts of accumulated precipitation are studied. The precipitation accumulations are measured every 12 h at the synoptic observing stations. The subset of the synoptic observing network that is constituted by the station list of the European Working Group on Limited Area Modelling (EWGLAM) is used. The categorical forecasts are binary forecasts that provide answers to questions such as: ‘Will the 12 h accumulated precipitation exceed a given threshold?’. Any categorical forecast falls into one of four classes depending on whether a particular event was both observed and forecast (hit, H), forecast but not observed (false alarm, FA), observed but not forecast (missed forecast, MF) or neither forecast nor observed (correct no-forecast, CN). As illustrated in Figure 8, for a population of N forecasts, the number of forecasts falling into each class can be given in a 2 × 2 contingency table, which allows further determination of various categorical verification scores. In this OSE study, we make use of four specific scores, which are defined as follows.
Probability of detection (varies in the range [0,1]),
False-alarm rate (varies in the range [0,1]),
True-skill score (varies in the range [−1,1]),
Equitable-threat score (varies in the range [−3−1
Apart from FAR, these verification scores increase with increasing forecast skill.
It would perhaps be better to use precipitation data from shorter accumulation times than 12 h. Unfortunately, most of the SYNOP stations do not provide precipitation accumulations over shorter time intervals. One possibility to investigate shorter accumulation times would rely on the use of ground-based radar measurements, but additional uncertainties and difficulties present in the radar measurement would further complicate the analysis.
Table I shows the constituents of contingency tables, as well as POD, FAR, TSS and ETS, for 12 h forecasts from CNTL, RGPS and BGPS; the scores of the two latter runs are given relative to those of CNTL. The thresholds of precipitation are chosen such that increasing forecast lead time is always associated with decreasing forecast skill. This criterion results in thresholds of 1, 3 and 10 mm being the most relevant ones.
Table I. Constituents of contingency tables and categorical forecast-verification scores for 12 h forecasts of accumulated precipitation.
|RGPS|| ||+ 16||+ 29||− 16||− 29||+.008||+.002||+.006||.000|
|BGPS|| ||− 17||− 54||+ 17||+ 54||−.008||−.005||−.003||+.003|
|RGPS|| ||+ 27||+ 16||− 27||− 16||+.020||−.004||+.024||+.009|
|BGPS|| ||− 16||− 56||+ 16||+ 56||−.012||−.011||−.001||+.003|
|RGPS|| ||+ 2||− 1||− 2||+ 1||+.005||−.003||+.008||+.003|
|BGPS|| ||− 8||− 26||+ 8||+ 26||−.019||−.007||−.011||−.002|
The contingency table values for 12 h forecasts indicate that the number of hits is approximately the same as the number of false alarms, and the number of missed forecasts is much smaller when the two lowest thresholds are considered. This means that precipitation events are forecast more often than observed, i.e. the forecast system is biased towards producing too much precipitation. This tendency is slightly enhanced when the ZTD data are assimilated in the regular mode (without bias correction), and more effectively reduced when the ZTD observation biases are accounted for. When higher precipitation amounts are considered, the numbers of both false alarms and missed forecasts exceed the numbers of hits. The interpretation of this is that the forecast skill becomes increasingly limited when high precipitation amounts are forecast.
In terms of the verification scores, both RGPS and BGPS fail to provide a clear impact over CNTL at the smallest threshold. The differences between the runs are very small. At a threshold of 3 mm, RGPS provides a small positive impact, as revealed by the increased POD, TSS and ETS, and decreased FAR compared with CNTL. At a threshold of 10 mm, the scores of RGPS are only marginally better than those of CNTL. The verification scores of BGPS are mainly similar to or slightly worse than those of CNTL. It should be noted that while the absolute values of FAR are relatively large (higher than 0.5), there is a potentially important positive impact in BGPS revealed by the decreased FAR.
The verification scores for categorical 18 h forecasts are given in Table II. The impact of GPS data assimilation appears to be less positive for 18 h forecasts than for 12 h forecasts. At a threshold of 10 mm, the verification scores are systematically degraded when GPS data are assimilated. At thresholds of 1 and 3 mm, there is a marginal improvement in TSS and POD in RGPS, but at the same time FAR is degraded. At these smaller thresholds, BGPS shows a mainly neutral impact compared with CNTL.
Table II. As Table I, but for 18 h forecasts.
|RGPS|| ||+ 25||+ 72||− 25||− 72||+.012||+.006||+.005||−.003|
|BGPS|| ||− 6||+ 1||+ 6||− 1||−.003||+.001||−.004||−.002|
|RGPS|| ||+ 17||+ 58||− 17||− 58||+.013||+.009||+.003||−.003|
|BGPS|| ||− 6||− 12||+ 6||+ 12||−.004||−.001||−.003||−.001|
|RGPS|| ||− 7||− 1||+ 7||+ 1||−.016||+.009||−.024||−.008|
|BGPS|| ||− 3||+ 6||+ 3||− 6||−.007||+.008||−.014||−.005|
In contrast to the 18 h forecasts, the verification scores of 24 h forecasts (Table III) show a mainly positive impact from GPS data assimilation. In the case of RGPS, there is a clear positive impact at all thresholds. At a threshold of 10 mm, the positive impact is further enhanced when the observation biases are corrected for.
Table III. As Table I, but for 24 h forecasts.
|RGPS|| ||+ 26||− 22||− 26||+ 22||+.012||−.007||+.020||+.010|
|BGPS|| ||− 4||− 60||+ 4||+ 60||−.002||−.008||+.006||+.007|
|RGPS|| ||+ 16||+ 14||− 16||− 14||+.012||−.001||+.014||+.005|
|BGPS|| ||− 10||− 9||+ 10||+ 9||−.008||+.001||−.008||−.003|
|RGPS|| ||+ 7||+ 14||− 7||− 14||+.016||−.002||+.018||+.005|
|BGPS|| ||+ 9||− 1||− 9||+ 1||+.021||−.013||+.034||+.012|
At longer forecast lead times the impact of GPS data is more mixed (not shown). It is noteworthy that the impact of GPS data on forecasts of accumulated precipitation is positive both in 12 and 24 h forecasts, but neutral or negative in 18 h forecasts. This is possibly related to observing practices at the synoptic stations. The accumulated precipitation is observed at 0600 and 1800 UTC each day. This means that those forecasts that are verifiable by observations of accumulated precipitation are either 12 and 24 h forecasts from analyses valid at 0600 and 1800 UTC or 18 h forecasts from analyses valid at 0000 and 1200 UTC. As there are usually much more radiosonde data available for analysis at 0000 and 1200 UTC than at 0600 and 1800 UTC, the GPS data are likely to influence the humidity analysis more at 0600 and 1800 UTC than at 0000 and 1200 UTC. The analyses at 0000 and 1200 UTC are more likely to be dictated by the radiosonde data.
However, the results of the verification against radiosonde data do not really support this hypothesis. All of the radiosonde observations that are used for verification in the previous subsection are made at either 0000 or 1200 UTC. Therefore, given that there is a hypothetical semi-diurnal cycle in the amount by which the GPS ZTD data influence the analysis, RGPS and BGPS should verify relatively better in 06, 18, 30 and 42 h forecasts than in 12, 24, 36 and 48 h forecasts. Such a 12 h cycle is not detected in the standard scores from the radiosonde verification.