Geophysical Research Letters

Impact of online empirical model correction on nonlinear error growth

Authors


Abstract

[1] The purpose of this study is to compare two methods of correcting the bias of a GCM; namely statistical correction performed a posteriori (offline) as a function of forecast length, and correction done within the model integration (online). The model errors of a low resolution GCM are estimated by the 6-hour forecast residual averaged over several years and used to correct the model. Both the offline and online corrections substantially reduce the model bias when applied to independent data. Their performance in correcting the model error is comparable at all lead times, but for lead times longer than 1-day the online corrected forecasts have smaller RMS forecast errors and larger anomaly correlations than offline corrected forecasts. These results indicate that the online correction reduces not only the growth of the bias but also the nonlinear growth of non-constant (state-dependent and random) forecast errors during the model integration.

1. Introduction

[2] Weather and climate predictions have improved dramatically over the last decade as a result of powerful computers being coupled with increasingly sophisticated data assimilation and ensemble forecasting techniques. However, faster computers and more accurate probabilistic estimates of the current state of the atmosphere are necessary, but not sufficient conditions for improved forecasts. Model errors, which include imperfect numerical discretizations of the equations of motion and deficiencies in the parameterizations used to represent the effect of sub-grid scale physical processes, result in systematic forecast errors and constitute an important component of the uncertainty observed in weather and climate predictions. As the methods of data assimilation, ensemble forecasting, and observing the Earth's climate become more sophisticated, the impact of model deficiencies becomes relatively more important [Kalnay, 2003].

[3] Mathematical methods for increasing the usefulness of forecasts made by a General Circulation Model (GCM) can be grouped into three categories: those which aim to (a) improve the initial state by optimally combining observations with forecasts (i.e., construction of analysis states by data assimilation [Anderson, 2001; Hunt et al., 2004; Danforth and Yorke, 2006]), (b) improve estimates of forecast uncertainties by estimating the growing errors of the day (e.g., ensemble forecasting [Toth and Kalnay, 1993]), and (c) identify and reduce the systematic model error (e.g., bias correction [Leith, 1978; Klinker and Sardeshmukh, 1992]).

[4] Model error can be diagnosed by separating the time series of short forecast residuals (difference between an analysis and the forecast) into state-independent (constant), state-dependent (function of model state), and random (noise) components. This study focuses on estimation of the constant component of model error, generally referred to as the bias. Corrections of the bias can either be made offline, after the forecast has been created [Glahn and Lowry, 1972], or online, by nudging the model forcing during the integration. One advantage of making the corrections offline, by far the most commonly used method in operational numerical weather or climate prediction, is its simplicity: for each N-hr forecast lead time, one adds the mean N-hr residual (error correction) estimated during the training period. This correction is easily estimated if enough forecasts and verifying analyses are available. A disadvantage is that after a short time, errors grow nonlinearly, and the correction of averaged nonlinear errors obscures their physical origin.

[5] An advantage of online correction is that the nonlinear growth of the bias is reduced during the integration, decreasing the cumulative effect of model error. Also, correction fields need only be computed for a single forecast length. Online correction provides continuously corrected forecasts at all lead times, and can be considered as an interim, empirical estimation of the errors in the model. A possible disadvantage of online correction is that the estimated residual added to the model forcing, if large, may interfere with the physical balance of model variables (e.g., geostrophy). Another concern is that model parameters may have been tuned to minimize errors in the (biased) model tendency, and may be less than optimal for the online corrected model. However, the errors introduced by these very parameterizations are the focus of the correction, and improvements of the parameterizations (or the numerical discretizations) can be tested by the extent to which they reduce the magnitude of empirical corrections. The parameter corrections necessitated by the model may even suggest physically meaningful sources of model error.

[6] Empirical correction has demonstrated varied performance in previous studies; some show that state-independent correction improves the random forecast error [Johansson and Saha, 1989; Yang and Anderson, 2000; Danforth et al., 2007]), while others find random error is unchanged [Saha, 1992; DelSole and Hou, 1999]. DelSole et al. [2008] suggest that the discrepancy is due to the fact that the resolutions of the GCMs used by the studies vary, leading the dynamical error components to have vastly different magnitudes. For example, in a toy model large bias corrections can improve the random errors, while in a state-of-the-art model the bias correction is likely to be much smaller and thus have less impact on random errors. This study compares the performance of online and offline empirical correction using a low-resolution GCM.

2. Numerical Experiment

[7] Following Leith [1978], consider an arbitrary ND dimensional dynamical system equation image(t) = M(x(t)) where x(t) and M(x(t)) are the model state vector and model tendency at step t, respectively. The model M is the best available representation of the governing dynamics of the physical process whose future behavior we are attempting to predict. Let xa(t) denote an estimate of the true state of the dynamical system at time t (obtained, for example, from an analysis) and let xΔtf(t) denote a prediction of xa(t) generated by integrating M for time Δt from the state xa(t − Δt). The residual at step t is given by the difference between the approximate truth xa(t) and the model forecast state xΔtf(t), namely δxΔt(t) = xa(t) − xΔtf(t), where Δt is the forecast lead time. The smaller Δt, the better δxΔt(t) approximates the instantaneous model tendency error associated with the state xΔtf(t). The time-average of the residuals, forecasts, and analyses gives an estimate of the model bias, the model climatology, and the analysis climatology, respectively.

[8] For this study, we have chosen M to be a relatively low-resolution (ND ≈ 105 variables) but realistic primitive-equation atmospheric model known as SPEEDY [Molteni, 2003]. The basic prognostic variables are vorticity (ζ), divergence (∇), absolute temperature (T), specific humidity (Q), and the logarithm of surface pressure (log(ps)). These variables are post-processed into zonal and meridional wind (u, v), geopotential height (Z), T, Q, and log(ps) at pressure levels (925, 850, 700, 500, 300, 200, 100 hPa). The true state of the dynamical system, namely xa, is obtained (without adding observational noise) from the NCEP Reanalysis [Kalnay et al., 1996], a widely used approximation of the state of the Earth's atmosphere over the last 60 years. Training is performed by creating residuals from 7-dy forecasts of Reanalysis states, available every 6-hrs during January 1981–1985 (Na = 620). Average forecast errors for lead times between 6-hrs and 7-dys are computed and denoted 〈δx6i〉. Note that offline correction requires computation of the bias for i = 1, 2, 3, …, 28, while online correction requires only i = 1.

[9] The online corrected model is generated by adding the average 6-hr residual observed during the training period to the time derivative predicted by the model M. The online corrected dynamical system is then given by equation image(t) = M(x(t)) + equation imageδx6〉 ≡ M1(x(t)). The term equation imageδx6〉 represents the time-average 6-hr forecast error of M estimated during the training period, normalized to have the correct weight when added to M.

[10] The online corrected model M1 was evaluated by creating 7-dy forecasts of Reanalysis states every 6-hrs during January 1986–1990 (note that the training and testing periods are independent and equal in length). The offline corrected model was evaluated by taking 7-dy forecasts made by M and adding the field 〈δx6i〉 at time 6i-hrs. The resulting set of forecasts for each method are compared with uncorrected forecasts made by M. Performance for each method is then characterized by a set of forecast accuracy metrics. The root mean square error (RMSE) can be decomposed into contributions from the bias and non-constant errors at time t and latitude ϕ by

equation image

where NL is the number of longitudinal grid points at latitude ϕ (NL = 96 for the SPEEDY model) and δxΔt is the anomalous forecast residual (i.e., δxΔt = δxΔt − 〈δxΔt〉). The anomaly correlation (AC) is computed by taking the inner product of the forecast anomaly and the Reanalysis anomaly with respect to their corresponding climatologies, and normalizing so that a perfect forecast has AC = 1. It is common to consider that the forecast remains useful if AC > 0.6.

3. Results

[11] The online corrected model is used to generate 7-dy forecasts of every 6-hr state during the testing period of January 1986–1990. The average forecast residual for the online corrected models, namely 〈δxΔt〉, is calculated again for the testing period, as it was previously for the training period. Figures 1a and 1c show the mean residual 〈δx6〉 of M and Figures 1b and 1d show the difference between the mean residual of forecasts made by M1 and M at Δt = 6-hrs for the zonal component of the wind at 200 hPa, u, (Figures 1a and 1b) and the temperature T at 850 hPa (Figures 1c and 1d). During the training period, the SPEEDY model underestimates the poleward side of the westerly jet east of the Himalayan mountains by 2–5 [m/s] after 6-hrs (not shown) and during the testing period the average 6-hr errors are similar but not identical (Figure 1a). As a result, during the testing period, the online corrected model produces a stronger jet due to the increased forcing suggested by the residual, and the resulting 6-hr forecasts are about 1.5 [m/s] more accurate than those made by the original model. As indicated by the prevalence of cool colors, the online correction to M improves the 6-hr model climatology in most locations (Figure 1b). During the training period, forecasts made by M are generally too cool (by as much as 3 [K] near the South Pole) except at the North Pole (not shown). During testing, M1 is about 2 [K] more accurate in these regions and has a uniformly positive or neutral impact (shown by the reduction of errors, cool colors in Figure 1d).

Figure 1.

Mean forecast error of the uncorrected model M and improvement exhibited by forecasts made by the online corrected model M1 over those generated by M when verifying on 5-yrs of independent data. (a) Forecast error is shown in color for u [m/s] and (c) T [K] at a lead time of 6-hrs. Negative (positive) values indicate areas where M over (under) estimates u and T. (b,d) Negative (positive) values indicate areas where M1 exhibits smaller (larger) forecast error than M. Improvements in temperature are most evident near the poles, where up to 8 degrees K of accuracy is gained by a lead time of 7-dys (not shown). Contours show the Reanalysis climatology 〈xa〉.

[12] The bias of online and offline corrected models are compared in Figure 2 which shows the zonally averaged systematic errors of the original model (Figures 2a2f), the improvement in bias exhibited by the online and offline corrected models (Figures 2g2r) and the difference between the improvement exhibited by online and offline correction (Figures 2s2x). The latitude-height fields of wind and temperature are shown at lead times of 1, 3, and 5-dys. The original model underestimates u near the poles at low levels, and mid-latitudes by day 5 by up to 4 [m/s] (Figure 2b). Both online and offline corrected forecasts are up to 4 [m/s] better at low level mid-latitudes (Figures 2h and 2n). The original model is too warm (cold) at lower (upper) levels near the South Pole, and too cold at all levels near the North Pole (Figures 2d2f). Biases in both poles are significantly improved by online and offline correction, by up to 3 [K] by day 3 (Figures 2k and 2q), and the improvement increases by day 5, when the biases are even larger. Overall, the fact that Figures 2g2l are quite similar to Figures 2m2r indicates that the online correction, estimated using just the 6-hr residuals, succeeds in reproducing quite well the nonlinear evolution of the bias as estimated by the offline correction.

Figure 2.

(a–f) Bias of the uncorrected model M, (g–l) improvement exhibited by forecasts made by the online corrected model M1 and (m–r) offline corrected models over those generated by M when verifying on 5-yrs of independent data, and (s–x) the difference between the improvement exhibited by online and offline correction. Zonally averaged forecast error is shown at all levels for u [m/s] and T [K] at lead times of 1, 3, and 5-dys. In Figures 2g–2r negative (positive) values indicate areas where M1 and offline correction exhibit smaller (larger) forecast error than M. Online correction (Figures 2g–2l) performs as well as offline (Figures 2m–2r) in most regions and reduces the model bias substantially, especially at the poles. Note that Figures 2m–2r are ∣〈δx61986–90 − 〈δx61981–85∣ − ∣〈δx61986–90∣. In Figures 2s–2x warm colors indicate areas where offline correction outperformed online correction in reducing the bias.

[13] Finally, we measure the impact that the online correction has on the nonlinear evolution of the forecast errors (both bias and non-constant). Figure 3 shows the non-constant error, namely 〈(δxΔt)2〉 (see equation (1)), for the same quantities plotted in Figures 2a2f. Unlike the bias, the non-constant errors of the original model exhibit little spatial structure. Nevertheless, the online correction is still able to reduce the time-averaged standard deviation of the u errors by up to 2 [m/s], and T errors by up to 2 [K]. Note that the offline correction is by definition unable to correct the non-constant errors. The superior performance of online correction is further illustrated in Table 1 which shows the global average RMSE for forecasts made by the three models. By day 3, the empirically corrected forecasts are roughly as accurate as the original model forecasts at 1-dy lead.

Figure 3.

(left) Non-constant (state-dependent and random) errors of the uncorrected model M and (right) improvement exhibited by forecasts made by the online corrected model over those generated by the original model M when verifying on 5-yrs of independent data. Note that the offline corrected model has the same non-constant error as the uncorrected model. A clear advantage to online correction is observed, especially for upper level winds.

Table 1. Global Average RMSE for the Original, Online Corrected, and Offline Corrected Model Predictionsa
VariableLead Time (hrs)OriginalOnlineOfflineBias Fraction
  • a

    Global average RMSE, contribution of zonal average at latitude ϕ weighted by cos ϕ. With the exception of 5-dy lead forecasts of temperature at 850 hPa, online correction beats offline. The bias fraction for the original model (squared bias/MSE) increases with time for u, indicating that random errors constitute a decreasing component of u forecasts.

u @ 200 hPa [m/s]243.332.152.6214%
 724.583.383.8015%
 1205.004.044.3025%
T @ 850 hPa [K]241.140.960.9940%
 722.061.711.7331%
 1202.592.152.1124%
Z @ 500 hPa [m]2432.2924.9126.4540%
 7255.2742.0944.8321%
 12068.2756.2457.2717%

[14] DelSole et al. [2008] propose that state-independent correction can improve the random error only if the bias is large, i.e., the model has an incorrect basic tendency. For more sophisticated models the basic tendency is quite accurate, and the empirical correction is likely to be smaller and thus less capable of improving random errors. Table 1 shows the bias fraction, or ratio of squared bias to MSE, indicating that random errors are responsible for the majority of MSE in the original SPEEDY model. Yang et al. [2008] found the 5-dy bias fraction of the NCEP GFS to be 10% in u and 25% in T during March 2008. While these fractions are only slightly smaller than those reported in Table 1, the magnitude of MSE is much smaller for the GFS, so nonlinear growth of bias is less of a problem than it is for lower resolution models.

[15] Table 2 shows the improvement in AC score (time to cross AC = 0.6) for online and offline empirical correction, relative to the original model. The online improvement is largest for u, and reduces with increasing year in the testing period. This suggests that the bias of the original model, estimated during training, may become somewhat less correlated with the bias observed during testing as the interval of time between training and testing grows. In fact, using shorter training periods (on the order of a month) may be more effective in correcting the bias, especially for more sophisticated models.

Table 2. Improvementa in Crossing Time of AC = 0.6 for Online and Offline Empirical Correction, Relative to the Crossing Time of Forecasts Made by the Original Model M
YearVariableOnlineOffline
  • a

    For example, forecasts of geopotential height Z at 500 hPa for Jan 1986 made by M were useful (AC > 0.6) for approximately 51-hrs. Forecasts made by the online corrected model M1 remained useful for 83-hrs (i.e., 32-hrs longer, bold text). Empirically corrected forecasts outperformed those of the original model in every case, and offline corrected forecasts with few exceptions.

1986u @ 200 hPa31 hrs (133%)22 hrs (93%)
 T @ 850 hPa22 hrs (44%)22 hrs (44%)
 Z @ 500 hPa32 hrs (63%)30 hrs (59%)
1987u @ 200 hPa33 hrs (105%)18 hrs (56%)
 T @ 850 hPa21 hrs (45%)19 hrs (41%)
 Z @ 500 hPa28 hrs (40%)22 hrs (31%)
1988u @ 200 hPa26 hrs (91%)18 hrs (61%)
 T @ 850 hPa8 hrs (11%)5 hrs (7%)
 Z @ 500 hPa21 hrs (31%)23 hrs (33%)
1989u @ 200 hPa22 hrs (73%)18 hrs (61%)
 T @ 850 hPa18 hrs (36%)21 hrs (42%)
 Z @ 500 hPa17 hrs (16%)9 hrs (9%)
1990u @ 200 hPa25 hrs (83%)18 hrs (59%)
 T @ 850 hPa23 hrs (52%)21 hrs (47%)
 Z @ 500 hPa35 hrs (71%)28 hrs (57%)

4. Discussion

[16] The findings presented in this study suggest that both online and offline correction show a similar performance in reducing bias. However, when considering the total forecast error (both bias and non-constant), online corrected forecasts have a smaller RMSE and a larger AC than both biased and offline corrected forecasts. The improvement is attributed to the ability of online correction to reduce nonlinear error growth. Empirical correction resulted in large improvements in zonal wind (larger than temperature). One potential explanation is that the SPEEDY model representation of the atmosphere has a large temperature bias, and consequently leaves room for improvement in u through the thermal wind balance. In operational models whose temperature biases are relatively small, empirical correction of temperature does not appreciably reduce the bias or random errors in u [DelSole et al., 2008; Yang et al., 2008].

[17] Provided the model bias estimated during training is similar enough to the bias exhibited during testing, online correction will be more effective than offline due to its ability to reduce the cumulative nonlinear growth of state-dependent and random model errors. However, these results may be optimistic in that more sophisticated models will have smaller biases, and thus will comparatively suffer less from nonlinear growth of the bias.

[18] Estimation of state-dependent model errors is also an important component of empirical correction, and should be investigated further [Danforth and Kalnay, 2008]. Future development will require use of a more realistic model, test impacts on longer range forecasts and other variables, e.g., precipitation, and test the impact on an ensemble of reforecasts [Hamill et al., 2006]. This method could be used to compare the NCEP and ERA-40 reanalyses (the better reanalysis should lead to better bias corrections), and to include online correction of model errors within data assimilation [Li, 2007].

Acknowledgments

[19] This research was supported by a NOAA THORPEX grant NOAA/NA040AR4310103, a NASA Phase-II grant NNG 06GE87G to the Vermont Advanced Computing Center, a VT NSF-EPSCoR grant, and a VT NASA-EPSCoR grant.

Ancillary