A method for post-processing decadal predictions from global climate models that accounts for model deficiencies in representing climate trends is proposed and applied to decadal predictions of annual global mean temperature from the Canadian Centre for Climate Modelling and Analysis climate model. The method, which provides a time-dependent trend adjustment, reduces residual drifts that remain after applying the standard time-independent bias correction when the modelled and observed long-term trends differ. Initialized predictions and uninitialized simulations that share common specified external forcing are analyzed. Trend adjustment substantially reduces forecast errors in both cases and initialization further enhances skill, particularly for the first forecast year.
 To evaluate the ability of global climate models to predict near-term future climate, the Coupled Model Intercomparison Project Phase 5 (CMIP5) protocol [Taylor et al., 2012] (http://cmip-pcmdi.llnl.gov/cmip5/) prescribes ensembles of 10- to 30-yr hindcasts initialized from climate states observed near the end of years 1960 until present. The Canadian Centre for Climate Modelling and Analysis (CCCma) has contributed 10-year ensemble predictions initialized at the end of each year during this period. The purpose of these decadal hindcasts is two-fold. First, they enable evaluations of model biases and other systematic errors that can be used to correct such errors both in the hindcasts and in future predictions. Second, the hindcasts provide measures of historical skill that serve to quantify the expected accuracy of future predictions.
 The estimation of model biases and evaluation of forecast skill are two standard practices in climate prediction, as for example in seasonal forecasting [e.g., Merryfield et al., 2010]. Typically, stationarity of forecast errors for a particular lead time is assumed when estimating model biases for monthly to multi-seasonal climate predictions. This assumption is justified if changes in model biases over the validation period spanned by the hindcasts are much smaller than typical magnitudes of the climate anomalies being predicted. However, a model that misrepresents long-term trends such as the global warming trend of the past few decades can have systematic forecast errors that depend on the initial forecast year. The purpose of this paper is to draw attention to this issue and propose a methodology for addressing it.
2. Decadal Prediction Experiments
 We use data from decadal prediction experiments with the CCCma fourth generation global climate model CanCM4. This model is similar to the second generation Canadian Earth System Model (CanESM2) which is used for the CMIP5 long-term experiments [Arora et al., 2011] but lacks an interactive carbon cycle. The CanCM4 predictions employ historical greenhouse gas concentrations, aerosol emissions and naturally occurring solar and volcanic forcings through 2005, and follow the RCP4.5 scenario thereafter in accordance with the CMIP5 protocol. An analysis of CanCM4 skill in predicting decadal trends is presented in Fyfe et al. .
 Two 10-member ensembles of simulations are considered. The first consists of 10 historical simulations for the period 1850–2005, continued as RCP4.5 simulations thereafter from which we use years 1961–2021. These runs are referred to as uninitialized decadal predictions. The second ensemble consists of decadal hindcasts initialized at the end of each year from 1960 until present. The initialization method is identical to that used to initialize CanCM4-based multi-seasonal predictions [Merryfield et al., 2011], with atmospheric, ocean and sea ice states constrained by observed conditions. This initialization is of the “full field” type because the actual observed values are assimilated, in contrast to the anomaly initialization used by some prediction systems [e.g., Smith et al., 2007].
 The bias-correcting approach developed here is introduced by applying it to decadal predictions of annual global mean near-surface temperature. Deficiencies in the forecast system may result in the forecast drifting from the observationally-constrained initial state towards the climate of uninitialized model simulations. This is illustrated schematically inFigure 1a, for an idealized case in which there is no variability other than the forced linear long-term trends. A critical point is that the observed and modelled climate responses to anthropogenic and natural forcings may differ as indicated by the differing slopes of the observed (black) and modelled (grey) lines and the offset between them at the beginning of the record.
 As time progresses, the initialized decadal predictions “drift” away from their initial states which are close to the observations and towards the uninitialized model climate, as indicated schematically in Figure 1aby the solid coloured curves. As a result of differences in observed and modelled long-term trends, model drift is dependent on the year in which the forecast begins. In the schematic the model drift is greater in earlier years, as the difference between the uninitialized model climate and the observed climate is larger for these years, but diminishes in the later years when the model and observed climates converge. This progression is illustrated in lower right corner ofFigure 1a where the forecast drifts for different years are superimposed. The mean drift among this set of forecasts, termed the mean model bias, is indicated by the thick brown curve.
 Typically, the mean model bias at each lead time is removed from each climate prediction as, e.g., in Smith et al. . Removing the mean model bias in Figure 1aresults in bias-corrected predictions that are indicated by the dashed coloured curves. Because the mean drift under-estimates drift magnitude in the first part of the validation period, the bias-corrected predictions exhibit a residual drift toward cooler temperatures relative to the observed climate. By contrast, the bias-corrected predictions exhibit a residual drift toward warmer temperatures in the second part of the record. This problem will worsen in future forecasts if the same fixed bias correction is applied because the uninitialized model climate will continue to warm relative to the observed climate.
 If a model underestimates the long-term warming trend the situation is opposite to that inFigure 1aand future bias-corrected predictions will exhibit systematic residual drifts toward cooler temperatures relative to the observed trend. This behaviour is largely independent of whether a forecast system uses the full-field or anomaly initialization method, and is mainly a function of the differences between the modelled and observed secular trends. The remainder of this section describes a method for mitigating these effects by taking into account differences between observed and modelled long-term trends.
 Let Y(tj, l) ≡ Yjl denote the model predictions (here we consider ensemble mean predictions and annual averages only), and X(tj, l) = X(tj + l) ≡ Yjl the corresponding verifying observations, where j = 1, …, nl indicates the initial year of the prediction, and l is the lead time between the forecast issuance date and the start of the forecast period being considered. For annual mean decadal predictions considered here, tj = 1961, …, 2012 and l = 0, …, 9 yrs. The number of forecasts nl available for validation generally depends on l. Given the 1961–2011 observed record, nl = 51 yrs for l = 0 but nl = 42 yrs for l = 9.
 It is convenient to represent the model predictions Yjl and the verifying observations Xjl as
where μly and μlxare the long-term means of the hindcasts and observations respectively in the validation period for each lead timel, slyj′ and slxj′ are the corresponding long-term linear trend components, indicates initial year relative to the centre of the validation period, and ejly and ejlxare forecast and observed deviations from the long-term linear trends.
 The standard bias correction for climate forecasts is
where the caret denotes a bias-corrected forecast, and the biasBl ≡ μly − μlx depends only on lead time l.This bias provides a satisfactory correction for forecast drift only if the observed and modelled long-term trends do not substantially differ. A more general correction is obtained by replacing the modelled long-term trend in(1) with the observed trend for each l, i.e.,
where the tilde denotes a trend-corrected forecast, and the linear trend correction ΔLjl depends on both lead time l and the initial year j and is given by
The slope coefficients sly and slx in the slope correction term ΔSjl are determined for each lead time lby the standard least squares method. Similar trend-adjusting techniques are used byBoer  for seasonal forecasts and in Fyfe et al.  for decadal predictions.
 If the trend corrections were to be applied to model hindcasts in the schematic in Figure 1athe residual drifts resulting from differences in trends would vanish and the resulting trend-corrected hindcasts would be perfectly aligned with the observed climate trend. As formulated above, the correction accounts only for systematic differences in the linear components of observed and modelled long term climate changes. A more complex and flexible scheme for correcting higher order differences in the observed and model climate evolution in the historical period is nominally possible. However, more complex models require that additional free parameters be estimated from the available relatively short validation samples and this will reduce the robustness of the corrections. One possible way of reducing the number of free parameters and hence improving the statistical robustness of the trend correction is to assume that the trend coefficients in(4) evolve with lead time l according to a predetermined analytical form that can be specified by a small number of parameters. This approach will be illustrated in the following section where CanCM4 hindcasts of global temperature are examined.
 It is important to note that “in-sample” corrections calculated without excluding the forecast that is being corrected can lead to artificially enhanced estimates of skill, and a cross-validation approach is often employed in seasonal forecasting to diminish this phenomenon. However, such an approach is problematic in the non-stationary setting considered here because eliminating a year near the end of the validation period for example would degrade the accuracy of the estimated trend coefficients more than eliminating a year near the middle of the validation period. Here we compare the performance of initialized versus uninitialized hindcasts. Since both types of hindcasts are treated alike, any skill differences should not be a result of artificially enhanced skill due to application of in-sample corrections.
4. CanCM4 Decadal Predictions of Global Mean Temperature
Figure 1b shows the raw CanCM4 ensemble mean predictions of annual mean global temperature initialized at the beginning of each year in the 1961–2012 period, together with the observed evolution of global mean temperature derived from the Goddard Institute for Space Studies (GISS) dataset [Hansen et al., 2010] (black), and the ensemble mean of the 10 uninitialized runs (grey). The dashed straight lines indicate the linear trends fitted to the observations and uninitialized runs for the 1961–2011 period. Although the annual anomalies in any given year may differ somewhat between the GISS data and other observation-based time series, we have verified that the long-term trends and trend corrections obtained using these alternative datasets are very similar (see Figure S1 in Text S1 in theauxiliary material).
 The CanCM4 results share many common features with the schematic of Figure 1a. In particular, the CanCM4 uninitialized runs over-predict the magnitude of the long term climate trend in the last 50 years and have a time-dependent cold bias that is largest at earlier times. The model drift in the initialized predictions is therefore greatest early in the verification period and diminishes in more recent years. The CanCM4 initialization method apparently imparts an initial warm bias that causes the first forecast year to be somewhat warmer than the observed record. Subsequently the decadal hindcasts evolve toward the uninitialized model climate, although in the latest years the initialized decadal predictions tend to become cooler than the uninitialized runs. This may indicate a more subtle and complex model drift behaviour than is suggested by the simple schematic inFigure 1a. However, the CanCM4 runs reproduce the main features in the schematic reasonably well.
 Bias-corrected hindcasts obtained using the standard bias-correction method(2) are displayed in Figure 1c. As in the schematic in Figure 1a, the adjusted hindcasts tend to predict temperatures that are cooler than observed near the beginning of the validation period, and warmer than observed near the end of it. As a result, these bias-corrected hindcasts tend to predict an excessive cooling trend in the 1960s as well as excessive recent warming trend which will likely continue to occur in future predictions. By contrast, application of the trend-correction method(3)–(4) results in hindcast predictions that are much closer to the observed record (Figure 1d).
Figures 2a and 2b show the trend offset and slope parameters obtained for each lead time l from the initialized (red) and uninitialized (grey) hindcasts, as well as the observed record (black). The vertical bars indicate 95% confidence intervals assuming that the interannual anomalies from the long term trend are independent and normally distributed. This independence assumption is probably too optimistic for annual mean global temperature so that the confidence intervals are likely to be underestimated and should be considered a lower estimate of sampling uncertainties. The small dependence of the trend parameters on l for the observed record and in the uninitialized runs is due to slightly different validation periods for different leads l, e.g., 1961–2011 for l = 0 but 1970–2011 for l = 9.
 These estimated trend parameters reflect many features of the behaviour of decadal predictions in Figure 1. In particular, the slope of the long term trend in the uninitialized simulations is overestimated in CanCM4 compared to the observed trend (∼1.2C°/50 yrs vs. ∼0.7C°/50 yrs). The trend slope at lead 0 in the initialized forecasts is close to the observed value but then approaches the slope of the uninitialized runs as lead time increases. However, some features in Figures 2a and 2b deviate somewhat from those expected from the simple schematic in Figure 1a. For example, the warm bias in the first forecast year, which is clearly visible in Figure 1b, is reflected in the larger than observed offset value for the first forecast year. The trend offsets for the initialized predictions tend to undershoot those for the uninitialized runs before converging toward the uninitialized values from below. Also, the trend slope estimates for the initialized predictions remain slightly below those for the uninitialized simulations at long lead times. These deviations may be consequences of the methods used in setting up the initialized predictions, which like many such schemes lead to unrealistic transient “shocks” particularly affecting the model ocean.
 Confidence intervals in Figures 2a and 2bindicate that there are substantial sampling uncertainties in estimating the correction parameters, which could potentially degrade the forecast quality of trend-corrected future predictions. The behaviour of the trend slope parameters appears to follow an exponential-like evolution from the observed estimates towards values in the uninitialized runs, however, which suggests that the trend dependence on leadl may be approximated by an exponential function of the form
where s0 is the slope for zero lead l = 0, s∞ is the final slope at long lead times l → ∞, and ls is the time scale of the convergence from the initial slope value towards the final slope. This reduces the number of parameters for the slope component from ten (sl, l = 0, …, 9) to three (s0, s∞, ls). Sampling errors are expected to be reduced as a result provided that the analytical function chosen is suitable for this purpose. The suitability of (5) is demonstrated by the dashed curves in Figures 2a and 2b which are obtained by estimating the parameters μl, l = 0, …, 9, s0, s∞, and ls by the standard least squares method considering hindcasts for all starting years and all lead times simultaneously: this form fits the parameters values estimated independently for each l very well.
 The exponential form in (5)appears to be flexible enough to accurately approximate the trend behaviour of the initialized and uninitialized hindcasts and the observations and should improve the statistical robustness of trend corrections. The trend-adjusted predictions inFigure 1 were obtained using this approximation. In general, it is not guaranteed that the trend coefficients will follow the exponential form (5)in other decadal prediction systems, although it is likely that their behaviour will have an exponential-like component. In practice, it would be advisable to perform goodness-of-fit tests to verify that any chosen analytical function fits trend estimates for individual time leads sufficiently well.
5. CanCM4 Decadal Prediction Skill
 This section presents basic skill measures of the CanCM4 decadal predictions of global mean temperature, comparing performance of the initialized hindcasts to that of the uninitialized runs to illustrate the added value of the initialization.
Figures 2c and 2d show the correlation skill score (standard Pearson's correlation coefficient) of annual global mean temperature as a function of lead time l = 0, …, 9 for the initialized predictions (red) and the uninitialized runs (grey), with 95% confidence intervals indicated by the vertical bars. Figure 2cshows skill of trend-adjusted hindcasts that include the long-term trend whileFigure 2dshows the corresponding skill for predicting variations about the long-term linear trend. The long-term trend is seen to contribute substantially to the correlation skill. The correlation score for the bias-corrected hindcasts without trend correction (not shown) is slightly greater than that for the trend-adjusted hindcasts since the trend, and hence its contribution to the correlation score, is overestimated.
 Initialization improves the correlation skill of CanCM4 predictions of global mean temperature modestly compared to the skill of the uninitialized simulations, mainly in the first 1–2 years. In particular, correlation skill for anomalies from the long-term linear trend in the first forecast year is about 0.7 for the initialized hindcasts but only about 0.5 for the uninitialized predictions. However, this skill improvement is barely statistically significant.Merryfield et al. report good performance of CanCM4 multi-seasonal predictions (to 12 months) of the El Niño–Southern Oscillation (ENSO) phenomenon. Since global mean temperature is influenced by ENSO variability [e.g.,Fyfe et al., 2010], much of the added skill in the initialized predictions in the first years is likely due to ENSO predictability.
 Following the first forecast year, correlation skill scores for the initialized hindcasts are comparable to and statistically indistinguishable from those of the uninitialized runs when evaluated independently for each lead time, although the initialized hindcasts do tend to have slightly better skills on average as indicated by the horizontal dashed lines. The average skill difference is statistically significant at the 95% confidence level, even if the first forecast year is excluded. The correlation skill scores of anomalies from the long-term trend tend to remain above ∼0.3 at all lead times for both the initialized and uninitialized runs. Presumably, some of this skill is due to responses of the climate system to external forcings, most notably volcanic forcing, that deviate from the long-term linear trend.
 Root-mean-square errors (rmse) obtained using the standard bias correction(2) and with the trend adjustment procedure (3)–(5) are compared in Figures 2e and 2f. As expected, trend adjustment improves rmse since the trend-adjusted forecasts align more closely with the observed record. Rmse is reduced by approximately a third for the uninitialized simulations and by about 20% for the initialized hindcasts. As expected, rmse for the uninitialized simulations shows little dependence on lead. The bias-corrected rmse inFigure 2e are smaller for the initialized predictions than the uninitialized runs and remain so at all leads. This difference at longer leads may be due to the slightly more realistic trend in the initialized predictions. With trend adjustment (Figure 2f) the initialized rmse remain generally lower than for the uninitialized runs although the differences lie within the confidence intervals for l ≥ 1. The average rmse difference is statistically significant, even if the first forecast year is excluded.
Figures 2g and 2hillustrate the magnitudes of observed and predicted decadal linear trends obtained through linear fits to the 10 forecast years for each starting year. The initialized (red) and uninitialized (grey) predictions generally track the GISS-based observational values (black), although the simulated values tend to overestimate the trend variability. Application of trend adjustment improves the prediction of decadal linear trends for the initialized case as indicated by the rmse of the decadal trend predictions (horizontal dashed lines). Rmse is reduced from 0.16 K/10 yrs for the bias-adjusted hindcasts to 0.12 K/10 yrs for the trend-adjusted hindcasts. This reduction of some 25% is statistically significant under the independence assumption, and remains significant at the 95% confidence level even if the effective sample size is reduced by half. The most striking changes due to the trend adjustment as compared to the bias correction are for the most recent initialized predictions: the decadal trend slope in the last 10 initialized forecasts is on average ∼0.2 K/10 yrs for the trend-adjusted predictions, as compared to ∼0.4 K/10 yrs for the bias-corrected predictions. This underlines the importance of the proposed trend adjustments for future predictions when the climate model does not accurately represent the observed long-term trends.
 Trend adjustments similar to those described above can in theory be applied to more localized predicted quantities, such as annual mean temperature at a set of grid locations. However, caution must be exercised in doing so because estimates of local trends are less reliable than of global trends. A simple but statistically robust method for adjusting local trends is proposed in the auxiliary material and illustrated through application to the CanCM4 decadal forecast issued at the beginning of year 2012.
6. Summary and Discussion
 A method of post-processing decadal predictions in order to remove model drift has been formulated and applied. The method takes into account possible differences in the observed and modelled long-term climate behaviour in the validation period by adjusting linear component of the long-term trend. The statistical robustness of the approach is improved by fitting an exponential-based function to parameters describing the dependence of trend slope on forecast lead time, which reduces the number of parameters to be estimated.
 The trend-adjustment method substantially reduces residual systematic drifts in decadal predictions that can remain even after applying a standard bias correction. The method was applied to decadal hindcasts and predictions of annual global mean near-surface temperature from CanCM4, whose global mean temperatures increase more rapidly than observed over the hindcast period. Errors in the trend adjusted hindcasts are substantially reduced as compared to the case in which only a standard bias correction is applied. Initialization enhances the skill of CanCM4 decadal predictions, particularly in the first forecast year.
 CanCM4 is not the only CMIP5 decadal prediction model whose historical warming of global mean temperature is unrealistically large [e.g., Kim et al., 2012]. Such mismatches between observed and modelled trends may be a consequence of model deficiencies, but could also be a consequence of incorrect specification of external forcing, for example the incorrect specification of aerosol precursor emissions resulting in a modelled temperature trend that is too large. Since the trend adjustment procedure described here is empirical it can nominally correct for a mismatches between observed and forecast linear trends that arise for either reason.
 We thank John Fyfe and Greg Flato, and two anonymous reviewers for their insightful comments.
 The Editor thanks the two anonymous reviewers for assisting in the evaluation of this paper.