The impact of model fidelity on seasonal predictive skill



[1] The relationship between the quality of a general circulation model's (GCM's) representation of present climate and its predictive skill on seasonal time scales is investigated by a series of GCM experiments. A novel procedure is developed to improve the quality of a GCM's present-day control climate by applying cyclostationary annually varying run-time “bias corrections”. Application of this procedure to the Canadian Centre for Climate Modelling and Analysis third generation atmospheric GCM (AGCM3) is shown to result in a significant reduction of time-mean biases in wind, temperature and humidity fields in its simulation of present-day climate. Furthermore, it is found that this cyclostationary correction leads to improved variability on seasonal time scales. The ability to improve a GCM's properties in this way allows a careful assessment of the relationship between a model's fidelity and its predictive skill. In this study, the potential predictive skill on seasonal time scales is assessed by performing ensemble simulations with the observed sea surface temperatures and sea-ice distribution. The analysis indicates that the increase in model fidelity associated with the application of the bias correction results in a general increase in predictive skill on seasonal times scales. To investigate this result further, an additional set of ensemble forecasts are performed with the sign of the bias correction reversed, thereby degrading model fidelity. In this additional experiment the corresponding predictive skill is also degraded. The results of this study have implications with regard to the application and interpretation of model metrics for climate GCMs.

1. Introduction

[2] General circulation models are the primary tool employed to make predictions of the future state of the climate system on time scales ranging from days and months to centuries. In the application of GCMs for this purpose, one of the basic assumptions is that the quality of a model's present-day climate relative to the observations is a good indicator of the credibility, or predictive skill, of its future predictions. It is this assumption that underlies the substantial community activity surrounding the development and application of metrics, which seeks to more quantitatively characterize a model's fidelity with respect to the observed record [Gleckler et al., 2008]. In this study, model fidelity is characterized in terms of the accuracy of the simulated present-day time-mean temperature, wind and humidity fields, and their inter-annual variability. The correlation between model fidelity and predictive skill is not easily verified, however, and an improved understanding of their relationship is essential to guide the community application of model metrics.

[3] To investigate this issue, we develop a procedure to enhance a model's fidelity with respect to the observed climate and investigate the impact on predictive skill for seasonal time scales. The first step involves the derivation of a cyclostationary annually-varying forcing which, when applied to a GCM, reduces its time-mean biases relative to observations. We refer to this procedure as run-time empirical bias correction and verify that its application results in a quantitative improvement in model fidelity both in terms of the time mean and the variance on seasonal time scales. The second step involves determining the impact of the bias correction on predictive skill through analysis of seasonal hindcasts from both the standard and bias-corrected version of the GCM. A key aspect of the present study is the ability to keep the underlying GCM unchanged (i.e., numerics, resolution, physical parameterizations) allowing all changes in skill to be directly related to changes in fidelity.

2. Methodology and Model Experiments

[4] The Canadian Centre for Climate Modelling and Analysis third generation atmospheric general circulation model (AGCM3) [Scinocca et al., 2008] is used for all simulations in this study. It is a spectral general circulation model that employs 31 vertical levels that extend from the surface up to roughly 1 hPa. The spectral representation of horizontal fields is triangularly truncated at the total wave number of 63 (T63). All runs are performed in the Atmospheric Model Intercomparison Project (AMIP) style configuration employing the observed sea surface temperature and sea-ice distribution from the HadISST1.1 dataset [Rayner et al., 2003] over the period 1989–2008.

[5] Present-day control climate biases of 500 hPa geopotential height (z500), zonal-mean temperature ([ta]), and zonal-mean zonal winds ([ua]) for AGCM3 in the December-January-February (DJF) season relative to the ERA interim reanalysis in 1989–2008 [Dee et al., 2011] are displayed in Figure 1 (middle). (Here and below the square brackets denote zonal averages.) The model biases in other seasons (not shown) have a different spatial structure but are of roughly equivalent magnitude.

Figure 1.

Model mean biases of DJF 500 hPa (top) geopotential height z500, m, (middle) zonal-mean temperature [ta], K, and (bottom) zonal-mean zonal winds [ua], m/s, with respect to the 1989–2008 ERA interim reanalysis in the (left) bias-improved, (middle) control, and (right) bias-degraded experiments. The magnitude of the biases as measured by RMSEmean is indicated in the panel titles.

[6] The annual cycle of the run-time bias corrections is derived as follows. An ensemble of five “adaptation” runs is performed for years 1989–2008 in which the winds, temperature and humidity in AGCM3 are relaxed toward the corresponding six-hourly ERA interim reanalysis fields interpolated to model levels and model time step. The adaptation runs are symbolically represented as

display math

where X represents model prognostic variables such as winds, temperature, and humidity, F(X) represents the model dynamics and physics, and the last term is the relaxation tendency toward the reference state XR with time scale τ. The relaxation is applied at each model level to large spatial scales only with total wavenumber n ≤ 21 and with a time scale τof 24 hours so as to constrain primarily the synoptic- and larger-scale variability. These particular settings are selected so that the magnitude of instantaneous 6-hourly differences between the GCM and ERA interim reanalysis data is comparable to the typical magnitude of differences between different reanalyses. The relaxation time scale for humidity is set to the weaker value of 36 hours to accommodate greater uncertainties in humidity observations. This relaxation procedure is akin to a nudging “data assimilation” technique in which relaxation tendencies keep the model state close to the observations.

[7] Relaxation tendencies are saved every 6 hours during the adaptation runs. A representative annual cycle of daily relaxation forcing is constructed by averaging the tendencies for each calendar day for all years in the 20-yr period 1989–2008 and all 5 ensemble members. The obtained annual cycle estimate is then temporally smoothed by a spectral discrete Gaussian filter that retains roughly the lowest three annual harmonics. The result is a bias correction in the form of a three dimensional smoothed annual cycle of forcing for the winds, temperatures, and humidity. The bias-correcting runs are symbolically written as

display math

where G is the empirical bias correction

display math

The operator inline image stands for the annual cycle estimator of X as described above. Note that the tendency correction G does not explicitly depend on the model solution, and is a function of the annual cycle only. It can be interpreted as an imposed cyclostationary forcing that counteracts the cumulative effects of model deficiencies.

[8] There is some dependence of the resulting cyclostationary corrections on the relaxation time scale τ and other parameters such as the spatial scales of the variables that are relaxed in the adaptation runs. Ideally, one would like to select an “optimal” parameter set that maximizes the predictive skill, if one is interested in climate predictions. In the absence of an efficient strategy for making these optimal choices, we selected the parameters based on the observational uncertainty argument outlined above. This may not be an optimal strategy for skill optimization but, as we demonstrate below, even this simple approach leads to model fidelity and predictive skill improvements.

[9] Three sets of 10-member ensemble AMIP-style simulations are performed over the 20-yr period 1989–2008. In the first, or control set, the standard version of AGCM3 is used. In the other two sets, hereafter referred to as the bias-improved and bias-degraded runs, the run-time bias-correcting tendenciesGare added and subtracted on the right-hand side of model equations respectively. This is done to examine the linearity of the model response to the bias-correcting tendencies, and to increase the likelihood of detecting small effects by comparing bias-improved runs to bias-degraded runs.

3. Results

[10] The impact of the application of the bias correction on the time-mean climate in AGCM3 is illustrated inFigure 1(left). The titles in the panels display the magnitude of the model mean biases relative to the ERA interim reanalysis as measured by the spatially averaged root-mean-square error (RMSE) defined as inline image, where inline image is the model climate, inline imageis the ERA interim climate, and the angular brackets denote spatial averaging (such as over the globe and/or in the vertical domain). The 500 hPa geopotential height and zonal-mean temperature and zonal mean winds all show a significant reduction in the bias magnitude. In contrast, the magnitude of the model biases is increased substantially in the bias-degraded runs as displayed inFigure 1 (right), while their spatial distribution remains roughly the same as in the control simulation.

[11] Figure 2(left) summarizes the magnitude of global model biases for several variables in the adaptation (yellow), bias-improved (pink), control (green) and bias-degraded (blue) runs as measured by RMSEmean. The bias RMSEs are estimated for each of the four standard seasons DJF, MAM, JJA, SON, and then aggregated into a single annual value as inline image, which is then normalized by the corresponding value in the control run. Values of RMSEmean< 1 indicate improvements in the model climatology while values > 1 indicate degradation of model biases. The variables considered are mean sea level pressure (psl), near-surface air temperature (tas), total precipitation rate (pr), 500 hPa geopotential height (z500), and zonal means of temperature ([ta]), zonal wind ([ua]), and specific humidity ([hus]). Biases in the zonal mean quantities are estimated on pressure levels and averaged between 100 and 1000 hPa. The vertical bars indicate the 95% confidence intervals estimated from the ensembles of runs.

Figure 2.

(left) The magnitude of model mean biases over the globe with respect to the 1989–2008 ERA interim reanalysis as measured by RMSEmeanfor mean sea level pressure (psl), near-surface air temperature (tas), precipitation rate (pr), 850 hPa temperature (t850), 500 hPa geopotential height (z500), zonal-mean temperature ([ta]), zonal-mean zonal wind ([ua]), and zonal-mean specific humidity ([hus]) in the adaptation (yellow), bias-improved (pink), control (green), and bias-degrading (blue) runs. All quantities are aggregated over four standard seasons and are normalized by the respective values in the control run. Vertical lines indicate the 95% confidence intervals estimated from the ensemble simulations. (right) The same as Figure 2 (left) but for the magnitude of the bias in the interannual standard deviation of seasonal means as measured by RMSEσ for sea level pressure (σpsl), near-surface screen temperature (σtas), 850-, 500-, and 200-hPa temperature (σt850, σt500, σt200) and geopotential height (σz850, σz500, σz200).

[12] As expected, the magnitude of the model bias is reduced, by as much as 50% depending on the variable, in the bias-improved runs but it is increased in the bias-degraded runs. The dependence of the bias magnitude on the sign of the bias-correcting tendencies is nearly linear although the application of the corrections with the opposite sign tends to degrade the model climate to a greater extent as compared to bias improvements. The adaptation runs provide a lower bound on model biases for the bias-improved runs, due to the strong influence of their state-dependent relaxation toward reanalysis values.

[13] By construction, the greatest changes are found for variables that are directly influenced by the bias corrections in the model equations, namely temperature, winds, and specific humidity. Smaller but still significant changes are found for quantities that are not directly bias corrected, such as near-surface temperature and precipitation. It is important to note that near surface temperatures over open oceans are close to the observed sea surface temperatures in AMIP-style simulations and therefore are less influenced by the application of bias corrections.

[14] The climate improvements that result from the application of the bias correction in the model are accompanied by improvements in the model-simulated interannual variability on seasonal time scales. This is illustrated inFigure 2(right), which shows the magnitude of differences between the observed and model-simulated interannual standard deviation of seasonal means. These differences are quantified in terms of the root-mean-square difference of the ratio of the model interannual standard deviationσM to the corresponding standard deviation in the reanalysis σR from one, that is, RMSEσ = 〈(σM/σR − 1)21/2. The limiting case of RMSEσ = 0 corresponds to the situation where the modelled and observed variance patterns are identical. In the following, RMSEσ will also be referred to as the variance bias, for brevity. The four seasonal values of RMSEσ for each season are aggregated into a single annual value and normalized by the corresponding values in the control run in Figure 2.

[15] The results indicate that there are noticeable and statistically significant improvements in the spatial distribution of the interannual variability on seasonal time scales in the bias-improved runs as measured by RMSEσ. In contrast, the spatial distribution of the interannual variance is degraded when the sign of the bias-correcting tendencies is reversed. The variance degradation is somewhat greater as compared to the variance improvements as one might anticipate from the greater mean bias deterioration in the bias-degraded runs. This is a non-trivial result in that the applied bias corrections are cyclostationary and so should have no direct impact on inter-annual variability in model simulations. The variance bias in the adaptation runs again provides a lower bound for maximum possible improvements.

[16] We now employ these AMIP-style runs to assess skill in predicting seasonal means. While these are not true forecasts, they provide estimates of the skills that could be attained by a GCM if the lower boundary conditions over the oceans could be predicted perfectly throughout the forecast period. Note that such AMIP-type simulations provide an estimate of skill that is associated with the lower boundary conditions only. The atmospheric initial conditions are, by construction, random and so irrelevant. On seasonal time scales beyond the deterministic predictability limit of a few weeks, the slowly varying SSTs and sea-ice distribution are thought to be one of the main sources of predictability for the atmosphere [e.g.,Kharin and Zwiers, 2001].

[17] Figure 3displays the spatially averaged temporal correlation skill scores of seasonal means of 500-hPa geopotential height in 1989–2008 for each of the four standard seasons and for all seasons combined in the three sets of model simulations. The correlation skill estimates are obtained for the northern and southern extratropics (polewards of 20°N and 20°S), the tropics (20°S–20°N) and the whole globe, and are subject to substantial sampling uncertainties, especially in the extratropics, as indicated by the vertical bars representing 95% boot-strapped confidence intervals of the correlation differences relative to the control run. Even so, the predictive skill tend to be greater in the bias-improved runs, especially when compared to that in the bias-degraded simulations, although the differences may not always be statistically significant in individual seasons and regions. Maps of the spatial distribution of the correlation skill changes (not shown) do not reveal a clear pattern due to large sampling errors of correlations estimated from 20-yr samples at the local grid scale. Spatial and temporal averaging is required to detect the fairly modest response in predictive skill due to changes in the model bias.

Figure 3.

The spatially averaged correlation skill score of ensemble mean 500-hPa geopotential height (z500) in the (top left) northern extratropics, (top right) southern extratropics, (bottom left) tropics, and (bottom right) over the globe simulated in 1989–2008 in the control (green), biased-improved (red), and bias-degraded (blue) experiments. The correlation score is calculated for each of the four standard seasons (DJF, MAM, JJA, and SON), and for all four seasons together (ALL). Vertical bars indicate the boot-strapped 95% confidence intervals of the correlation differences relative to the control runs.

[18] There appears to be some regional dependence of the skill improvements. In particular, in the tropics, where the predictability in the control simulations is already very high, there are only modest skill enhancements. Most of skill changes on a global scale seems to originate in extratropics. Also, one should not expect any skill improvements in regions where the intrinsic potential predictability is low. Additionally, there could be a seasonal dependence of skill improvements. However, it is difficult to tease out such seasonal dependence in the present experiments due to the modest skill signal and relatively large sampling errors for individual seasons. Longer runs would be needed to address these issues.

[19] Figure 4 summarizes the dependence of the variability bias and the predictive skill on the model mean bias in this study. The model mean bias on the horizontal axis in these plots is measured by RMSEmean normalized by that in the control run. The corresponding variance bias as measured by the normalized RMSEσ is displayed in Figure 4 (left) while changes in the correlation skill score are shown in Figure 4 (right). All statistics are aggregated over the whole globe and over four seasons to reduce sampling errors as much as possible. The results present clear evidence for a reduction in the bias of interannual variability and an increase in predictive skill when model mean biases are reduced (i.e., model fidelity is improved).

Figure 4.

(left) The magnitude of the variance bias over the globe in 1989–2008 as measured by RMSEσ as a function of the magnitude of the mean bias as measured by RMSEmeanin the bias-improved (pink) and bias-degraded runs (blue) relative to that in the control runs for sea level pressure (psl), near-surface air temperature (tas), 200-, 500-, 700-, and 850-hPa air temperature (t200, t500, t700, t850) and geopotential height (z200, z500, z700, z850). (right) The same as Figure 4 (left) but for changes in the globally averaged correlation skill score. All statistics are obtained by aggregating values for four standard seasons into annual values.

4. Summary and Discussion

[20] In the present paper, we have presented a method for constructing a set of annually varying cyclostationary run-time corrections designed to improve a GCM's present-day seasonally-varying mean climate. These empirical corrections are derived as the annual cycle of relaxation tendencies applied to the main prognostic variables at each time to keep the model evolution close to that of observation-based analyses.

[21] The applied empirical corrections improve the model climate, or degrade it if applied with the opposite sign. The bias improvements are found not only in the prognostic variables that are directly affected by the procedure but also in a number of other quantities such as precipitation, near surface temperature and mean sea level pressure, although to a lesser extent.

[22] Changes in the mean, or first moment, of the model climate are associated with corresponding changes in the second order climate statistics such as interannual variability and model predictive skill on seasonal time scales. These results are non-trivial and intriguing since the bias-correcting tendencies are cyclostationary and therefore cannot directly influence the higher order statistics of the climate variability. Since the model formulation is identical for all simulations in our study, we may ascribe all changes in model skill and variability directly to changes in the fidelity of the model's time-mean climate.

[23] Conceptually similar empirical corrections are considered in DelSole et al. [2008]. The main difference is in the method for estimating the correcting terms on the right hand side of model equations. They derive the empirical correction from short-term forecast tendency errors. Here the bias corrections are derived from the relaxation tendencies in the adaptation runs.DelSole et al. [2008]did not find consistent improvements in the random component of the forecast error variance in their simulations even though the model bias was reduced. One possible explanation for the apparent disagreement with this study is that their shorter 10-yr simulations may have been of insufficient length to detect the skill enhancement due to sampling issues. Another possible explanation is that their study was restricted to the predictive skill of initialized forecasts, which involves the influence of both the initial conditions and the lower boundary conditions. Here, the present study examines changes in predictability associated with perfect, slowly varying, lower boundary conditions only. In any event, the predictability response to the changes the mean bias is found to be fairly modest.

[24] The association of a model's predictive skill with its ability to reproduce the properties of the historical climate record, is a central assumption of the community-wide effort to subject climate models to performance metrics. Such metrics will have a range of complexity spanning from leading order properties, such as the time-mean climate, to higher order statistics such as trends and variances. Here we have restricted our evaluation of performance to the time-mean climatology and demonstrated that there is value for future predictions in the improvement of a climate model's ability to reproduce the historical climate. While this result supports the utility of performance metrics, it must be stressed that the present study pertains only to predictive skill on seasonal time scales and that the skill enhancement realized is modest. For longer time scales of decades to centuries, which are the focus of model-based future climate projections, there could be very different influences besides the fidelity of a model's present-day climatology that determine the model's climate sensitivity and its response to anthropogenic and natural occurring forcings.


[25] We thank Isla Simpson and Bill Merryfield, and two anonymous reviewers for their insightful comments.

[26] The Editor thanks the two anonymous reviewers for assisting in the evaluation of this paper.