## 1. Introduction

[2] Weather and climate predictions have improved dramatically over the last decade as a result of powerful computers being coupled with increasingly sophisticated data assimilation and ensemble forecasting techniques. However, faster computers and more accurate probabilistic estimates of the current state of the atmosphere are *necessary*, but not *sufficient* conditions for improved forecasts. Model errors, which include imperfect numerical discretizations of the equations of motion and deficiencies in the parameterizations used to represent the effect of sub-grid scale physical processes, result in systematic forecast errors and constitute an important component of the uncertainty observed in weather and climate predictions. As the methods of data assimilation, ensemble forecasting, and observing the Earth's climate become more sophisticated, the impact of model deficiencies becomes relatively more important [*Kalnay*, 2003].

[3] Mathematical methods for increasing the usefulness of forecasts made by a General Circulation Model (GCM) can be grouped into three categories: those which aim to (a) improve the initial state by optimally combining observations with forecasts (i.e., construction of analysis states by data assimilation [*Anderson*, 2001; *Hunt et al.*, 2004; *Danforth and Yorke*, 2006]), (b) improve estimates of forecast uncertainties by estimating the growing errors of the day (e.g., ensemble forecasting [*Toth and Kalnay*, 1993]), and (c) identify and reduce the systematic model error (e.g., bias correction [*Leith*, 1978; *Klinker and Sardeshmukh*, 1992]).

[4] Model error can be diagnosed by separating the time series of short forecast residuals (difference between an analysis and the forecast) into state-independent (constant), state-dependent (function of model state), and random (noise) components. This study focuses on estimation of the constant component of model error, generally referred to as the bias. Corrections of the bias can either be made *offline*, after the forecast has been created [*Glahn and Lowry*, 1972], or online, by nudging the model forcing during the integration. One advantage of making the corrections offline, by far the most commonly used method in operational numerical weather or climate prediction, is its simplicity: for each *N*-hr forecast lead time, one adds the mean *N*-hr residual (error correction) estimated during the training period. This correction is easily estimated if enough forecasts and verifying analyses are available. A disadvantage is that after a short time, errors grow nonlinearly, and the correction of averaged nonlinear errors obscures their physical origin.

[5] An advantage of online correction is that the nonlinear growth of the bias is reduced during the integration, decreasing the cumulative effect of model error. Also, correction fields need only be computed for a single forecast length. Online correction provides continuously corrected forecasts at all lead times, and can be considered as an interim, empirical estimation of the errors in the model. A possible disadvantage of online correction is that the estimated residual added to the model forcing, if large, may interfere with the physical balance of model variables (e.g., geostrophy). Another concern is that model parameters may have been tuned to minimize errors in the (biased) model tendency, and may be less than optimal for the online corrected model. However, the errors introduced by these very parameterizations are the focus of the correction, and improvements of the parameterizations (or the numerical discretizations) can be tested by the extent to which they reduce the magnitude of empirical corrections. The parameter corrections necessitated by the model may even suggest physically meaningful sources of model error.

[6] Empirical correction has demonstrated varied performance in previous studies; some show that state-independent correction improves the random forecast error [*Johansson and Saha*, 1989; *Yang and Anderson*, 2000; *Danforth et al.*, 2007]), while others find random error is unchanged [*Saha*, 1992; *DelSole and Hou*, 1999]. *DelSole et al.* [2008] suggest that the discrepancy is due to the fact that the resolutions of the GCMs used by the studies vary, leading the dynamical error components to have vastly different magnitudes. For example, in a toy model large bias corrections can improve the random errors, while in a state-of-the-art model the bias correction is likely to be much smaller and thus have less impact on random errors. This study compares the performance of online and offline empirical correction using a low-resolution GCM.