We compare observed decadal trends in global mean surface temperature with those predicted using a modelling system that encompasses observed initial condition information, externally forced response (due to anthropogenic greenhouse gases and aerosol precursors), and internally generated variability. We consider retrospective decadal forecasts for nine cases, initiated at five year intervals, with the first beginning in 1961 and the last in 2001. Forecast ensembles of size thirty are generated from differing but similar initial conditions. We concentrate on the trends that remain after removing the following natural signals in observations and hindcasts: dynamically induced atmospheric variability, El Niño-Southern Oscillation (ENSO), and the effects of explosive volcanic eruptions. We show that ensemble mean errors in the decadal trend hindcasts are smaller than in a parallel set of uninitialized free running climate simulations. The ENSO signal, which is skillfully predicted out to a year or so, has little impact on our decadal trend predictions, and our modelling system possesses skill, independent of ENSO, in predicting decadal trends in global mean surface temperature.
 The extent to which the time evolution of climate anomalies may be skillfully predicted from years to a decade or more has yet to be fully investigated. Early efforts at initialized coupled ocean-atmosphere predictions have shown some promise [Smith et al., 2007; Keenlyside et al., 2008; Meehl et al., 2009, and references therein]. In this study we focus on one important aspect of the problem, namely the prediction of linear trend in global mean surface temperature over a decade (in units of °C per decade). To this end, we analyze retrospective forecasts made with a modelling system initialized using observation based information, and forced with anthropogenic (greenhouse gases and aerosol precursors) and natural (volcanic and solar) forcing over the period from 1961 to 2010. The linear trends in the hindcasts are compared to those derived from two observation based analyses. Our use of multiple analyses is motivated by the fact that differing treatments can result in differing perceptions of decadal linear rates of global temperature change [Hansen et al., 2010]. Another important aspect of our study is the estimation and removal of the natural variability signals associated with short term fluctuations in atmospheric circulation, El Niño-Southern Oscillation (ENSO), and explosive volcanic eruptions. This helps make clearer where forecast trends deviate or do not deviate from observed trends.
2. Data and Models
 Two observational analyses are used in this study. The UK Met Office Hadley Centre and the University of East Anglia Climatic Research Unit (HadCRUT) [Brohan et al., 2006] combined land surface temperature and sea surface temperature (SST) analysis is available as monthly-mean values on a 5° (latitude) by 5° (longitude) grid. The updated Goddard Institute for Space Studies analysis (GISS) [Hansen et al., 2010] surface temperature analysis is available as monthly-mean values on 1° (latitude) by 1° (longitude) grids which are interpolated onto a 5° (latitude) by 5° (longitude) grids with 1200 km spatial smoothing. Both analyses are expressed as anomalies with respect to the 1961 to 1990 base period. We also use sea-level pressure (SLP) data provided by the National Center for Atmospheric Research Data Support Section given as monthly means on a 5° (latitude) by 5° (longitude) grid as described by Trenberth and Paolino . The seasonal cycle is removed by subtracting climatological monthly means.
 We employ version 4 of the Canadian Global Climate Model (CanCM4) which incorporates observed anthropogenic concentrations of greenhouse gases, emissions of aerosol precursors and naturally occurring forcings (volcanic and solar) to year 2005. Beyond 2005 the RCP4.5 medium mitigation scenario is employed. The model is very similar to the second generation Canadian Earth System Model (CanESM2) described by Arora et al.  except it has no interactive carbon cycle. Modelled monthly mean surface temperature and SLP fields are interpolated onto the same 5° (latitude) by 5° (longitude) grid as the observed analyses, and are thereafter treated identically to the analyses including masking with either the HadCRUT or GISS coverage.
 We consider four sets of simulations. The first set consists of ten historical simulations begun from 1850 with initial conditions drawn at 50-year intervals from a long CanESM2 control run. These are climate simulations (referred to here as “freecasts”) which evolve freely based on the specified external forcing. The second set consists of ten simulations initialized by constraining atmospheric and surface fields in the coupled model to remain close to observed values up to the start dates of the “hindcasts” in January 1961, 1966, 1971, 1976, 1981, 1986, 1991, 1996, and 2001. The ten simulations differ due to initial values in the system and result in an ensemble of ten hindcasts beginning from slightly differing initial conditions. Two additional hindcast ensembles, each of size ten, incorporate observed subsurface ocean temperatures, introduced directly, or as anomalies, using the off-line variational assimilation technique of Tang et al. , with adjustments to subsurface salinity as by Troccoli et al. . This variety of initialization techniques reflects the still-experimental nature of decadal prediction. The simulations follow a decadal prediction protocol established under phase five of the Coupled Model Intercomparison Project (CMIP5).
 Since observation-based and model-based climates tend to differ, hindcasts which are initialized to be near the observations tend to drift towards the model climate. For short term hindcasts this is accounted for by removing the mean bias. However, for longer term decadal hindcasts a linear trend correction may be required if the model does not reproduce long-term trends. For this reason, we correct for systematic long-term trend biases following a procedure detailed in the auxiliary material. We process the three sets of hindcasts using the different initialization techniques separately, but combine the predicted anomalies into one thirty-member ensemble in the following analysis. The ten-member ensemble of freecasts are also trend corrected in this way.
 As Thompson et al.  and Fyfe et al.  do, we represent the global mean surface temperature as TGLOBAL(t) = c + α t + TDYNAMIC(t) + TENSO(t) + TVOLCANO(t) + ɛ (t) where c is a constant, α t is a linear trend term, and where TDYNAMIC(t) = μDYNAMIC(t), TENSO(t) = νENSO(t), and TVOLCANO(t) = ξVOLCANO(t) attempt to represent dynamically induced atmospheric variability, the ENSO signal, and the effects of explosive volcanoes, respectively. The parameters α, μ, ν, and ξ are obtained from a multivariate regression using a prescribed first order autoregressive model for the noise ɛ(t). Confidence intervals on the parameters follow from this model for ɛ(t). DYNAMIC(t) is estimated by regressing SLP anomaly maps onto normalized land-ocean temperature difference time series for the Northern Hemisphere. ENSO(t) is estimated using an ocean mixed layer model CENSO (d/dt)ENSO(t) = F(t) −ENSO(t)/βENSO, where F(t) is an SST-based estimate of anomalous heat flux in the eastern tropical Pacific; βENSO is a linear damping coefficient set to 2/3 K W−1 m−2; and CENSO is an effective heat heat capacity obtained such that the correlation coefficient between ENSO(t) and TGLOBAL(t) is maximized. VOLCANO(t) is obtained similarly but with F(t) derived from optical thickness data.
 Decadal trends, our focus, are estimated as follows. First, we obtain ENSO(t) and CENSO using continuous observations from 1961 to 2010. Then, for each decade we integrate the mixed layer model from its start date t0 using ENSO(t0) and CENSO to obtain a new ENSO(t) of ten year duration. Finally, for each decade we perform a regression of the form TGLOBAL(t) − TVOLCANO(t) = c + α t + μDYNAMIC(t) + νENSO(t) + ɛ(t), where α is the desired decadal trend (in units of °C per decade). The volcanic signal, from the full period analysis, is not explicit in the regression since for some decades it is either negligible or overlapping the linear trend term. This procedure is highly robust, as seen by the fact that the full period and decadal time series are virtually indistinguishable (see Figure S1 in the auxiliary material). Freecast decadal trends are similarly obtained. For the hindcasts, we use freecast values of CENSO and TVOLCANO(t) since full period values of these quantities are unavailable for the hindcasts.
Figure 1 shows the evolution from 1961 to 2010 of the HadCRUT (left) and GISS (right) monthly-mean global-mean surface temperature anomalies (top), natural signals (middle), and residual times series (bottom) computed as TRESIDUAL(t) = TGLOBAL(t) − [TDYNAMIC(t) + TENSO(t) + TVOLCANO(t)] = α t + ɛ(t) so that TRESIDUAL(t) represents the component of global temperature that is linearly unrelated to the natural climate signals as computed here. These calculations are obtained using continuous data for the full period from 1961 to 2010. While the HadCRUT and GISS time series compare very well in most regards, we note that the long term trend in HadCRUT is about 10% smaller than in GISS – due primarily to a mismatch over the past decade. As demonstrated by Hansen et al. , the trend discrepancy over the past decade reflects the fact that the HadCRUT analysis excludes much of the Arctic, where warming has been especially large in the past decade, while the GISS analysis includes anomalies throughout most of the Arctic. This fact is made clear when comparing maps of HadCRUT and GISS trends for 2001 to 2010 (see Figure S2 in the auxiliary material).
 We now characterize some relevant aspects of our modelling system. Figure 2 (top) compares the standard deviation σ of the natural signals, as plotted in Figure 1, which are obtained using continuous data for the full period from 1961 to 2010. We perform two sets of calculations, one using the HadCRUT masking which we compare to the HadCRUT analysis (left), and the other using the GISS masking which we compare to the GISS analysis (right). For the observations, we plot σ with a vertical white line and its 95% confidence interval with a horizontal grey bar. Confidence intervals on σ follow from confidence intervals on the respective natural signal regression parameters. For the freecasts, we plot the ten-member ensemble mean σ with a vertical white line and the ten-member ensemble mean 95% confidence interval with a horizontal red bar. We see that the model overestimates the observed influence of ENSO and the volcanic response by about 30% in this calculation. These differences, while usefully noted, are not an issue here given our ultimate focus on the residual time series which exclude these natural signals.
Figure 2 (bottom) compares the 1961 to 2010 trend coefficient α in the full time series, and in the residual time series where the natural climate signals are removed. The ensemble mean freecast uncertainty in TGLOBAL(t) is much larger than observed as a consequence of the overestimated natural signals. Hence, the freecast and HadCRUT trends cannot be statistically distinguished on this basis. However, with the natural signals removed, as in TRESIDUAL(t), it is apparent that the freecasts overestimate the observed rate of long term change by about 30%. This bias is on the high side but of a magnitude not entirely uncommon amongst climate models [Fyfe et al., 2010], and is not an issue here since we ultimately correct for trend biases.
 We now examine the decadal trends αmn, where m = 1, ⋯, M denotes decade and n = 1, ⋯, N denotes ensemble member. Figure 3 shows the cumulative distribution of observed (black; N = 1), freecast (red; N = 10), and hindcast (blue; N = 30) decadal trends. These distributions indicate about a 40% probability of a decadal trend not being statistically greater than zero, as judged by the point below which the 95% confidence intervals (grey shaded) systematically overlap zero. This illustrates the point, often misunderstood by the general public, that despite anthropogenic GHG warming the climate may produce decades where global mean surface temperature shows no warming, and possibly even cooling [Easterling and Wehner, 2009; Knight et al., 2009]. We note that the hindcast distributions are somewhat more constrained, i.e. are narrower and more compact, than the freecast distributions. This finding foreshadows our key result concerning trend predictability.
 We now consider ensemble spread in the simulated trend for the mth decade, defined here as Sm = ∣αmn − m·∣/N with m· = αmn/N, and the overall spread S = Sm/M (where dots indicate the average over that index). We use absolute difference as our metric to avoid excessive weighting of outliers although we find that using squared difference has little impact on our overall conclusions. Confidence intervals on S and Sm are obtained using bootstrapping which aside from independence, makes no assumptions about the underlying distributions. Figure 4 (top) shows Sm for the individual decades, and average S in the right-most column labeled 1961–2010. We note that the difference between the freecast and hindcast value of S is not statistically different from zero at the 95% confidence level, as determined by bootstapping the M values of Δ Sm = Smh − Smf, where h and f denote the hindcast and freecast value, respectively. In other words, we find no statistically significant evidence for observed initial condition information impacting ensemble spread, although the statement is conditional on the small sample size available.
 Next we consider average absolute ensemble mean error E = ∣Em∣/M, where Em = (αmn − αmo)/N = m· − αmo, and αmo is the mth decade observed trend. Figure 4 (bottom) shows Em for the individual decades, and average E in the right-most column labeled 1961–2010. We see that that hindcast value of E (blue) is slightly smaller than the freecast value of E (red) with verification against HadCRUT (left), and more distinctly smaller with verification against GISS (right). In either case, the difference between the freecast and hindcast value of E is statistically distinguishable from zero at the 95% confidence level, as determined by bootstapping the M values of Δ ∣Em∣ = ∣Emh∣ − ∣Emf∣, where h and f denote the hindcast and freecast value, respectively. Thus, observed initial condition information results in smaller ensemble mean errors. To further quantify this impact, we define absolute ensemble mean skill as S = 1 − Eh/Ef, where values of zero and one indicate no skill and perfect skill with respect to the freecasts, respectively. For the HadCRUT case S = 0.20 and for the GISS case S = 0.44. Similar values are obtained using the individual ensemble sets from the three different initialization techniques. The ensemble-mean linear correlation between the freecast and observed trends, and between the hindcast and observed trends, is about 0.4 and 0.5, respectively.
 In the above, estimates of the ENSO signal in global temperature were removed. Given that ENSO has a predictable component it is reasonable to assume that its presence could affect our results, presumably for better but possibly for worse. For this reason we repeat our analysis using TRESIDUAL(t) + TENSO(t) instead of TRESIDUAL(t). First, we demonstrate (see Figure S3 in the auxiliary material) that our modelling system skillfully predicts ENSO-related monthly anomalies out to a year or so. Second, we note that with ENSO present the freecast and hindcast ensemble mean errors increase in roughly the same proportion, such that the skill when verifying against HadCRUT goes from about 0.20 to 0.16, and against GISS analyses from about 0.44 to 0.30. In other words, the ENSO signal slightly degrades our models ability to predict decadal trends in global temperature. We note that the inclusion of TDYNAMIC(t) and TVOLCANO(t) in our skill calculations does not significantly change our main conclusion that initialization leads to more skillful predictions.
 This study compares observed decadal trends in global mean surface temperature with those predicted using a modelling system that allows for observed initial condition information, externally forced response (due to anthropogenic greenhouse gases and aerosol precursors), and internally generated variability. We consider ensembles of retrospective forecasts, as well as ensembles of uninitialized climate simulations which evolve freely under the same external forcing. The experimental design follows the decadal prediction protocol established under phase five of the Coupled Model Intercomparison Project (CMIP5). Our main finding is that our modelling system possesses skill, independent of ENSO, in predicting decadal trends in global mean surface temperature – due to an improved representation of decadal variability. Continuing work aims at deepening our understanding of why initial condition information in our model, and possibly other models participating in the CMIP5 exercise, enhances decadal trend prediction.
 We are very grateful to J.S. Scinocca and N. Gillett for reading an early draft of the paper, and providing invaluable suggestions and advice.
 The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.