This article aims at quantifying the improvement in climate prediction skill as a function of temporal (from monthly to decadal) and spatial scales (from grid point to global) when initializing a perturbed parameter ensemble of the Hadley Centre Climate Model. The focus is on near-surface temperature and precipitation in the Tropical band, the Northern and Southern hemispheres. For temperature, the forecast system reproduces the dominant impact of the external forcing at global spatial scale and at decadal time scales. There are significant improvements with initialization for the first 40 forecast months in the global and tropical domains. In the Northern (Southern) hemisphere, the initialization increases the skill in the first 12 (20) months on regional but not hemispheric scales. The initialization has a stronger impact in the model variants with a weaker global-mean temperature trend. For precipitation, the initialization corrects the negative correlation found at global and tropical scales.
 Providing climate change information at interannual time scales that could be useful to the agriculture, energy, or health sectors is one of the main societal priorities. Decadal predictions aim at satisfying such urgent demand, focusing on time scales of several years to a few decades. While climate change projection focuses on reproducing the long-term trend in climate variables, decadal prediction also aims at modeling the low-frequency variability superimposed on any radiatively forced climate change [Meehl et al., 2009]. The potential predictability can be quantified by the ratio of the temporal variability filtered over decadal timescales, over the total variability. Regions exhibiting potential predictability indicate where there is a chance to find predictive skill at such timescales. Because of the large heat-storage capacity of the ocean and its slow release, most of such low-frequency variabilities come from the ocean and are driven by different mechanisms. Decadal potential predictability is found over the oceans at mid to high latitudes [Boer and Lambert, 2008]. Previous works have shown that the Atlantic meridional overturning circulation is potentially predictable a decade in advance, and it has also been shown (through a perfect model study of potential predictability) that in the North Atlantic, many variables such as sea surface temperature (SST), salinity, heat content, or meridional transport could be potentially predictable for many years [Griffies and Bryan, 1997a, 1997b]. In the Pacific, the model-based study from Branstator et al.  shows that the decadal variability observed in the North Pacific sea surface temperature (SST) is pronounced but not necessarily predictable. Guemas et al.  rather incriminate model deficiencies in the inability to predict the North Pacific SST.
 The study of potential predictability has led to the attempt of producing real predictions. Smith et al. , Keenlyside et al. , Pohlmann et al. , and Mochizuki et al.  obtained good results in the North Atlantic and North Pacific. In climate predictions, the following sources of uncertainty compromise the forecast quality [Robson, 2010]:
 Internal variability: the natural (and not externally radiatively forced) variability of the climate system.
 Model inadequacy: the parametrization of the physical processes is a source of uncertainty as the estimation of the parameters introduces errors into the model. Moreover, some processes are not even simulated because they are not known yet.
 Scenario uncertainty: uncertainties due to the unforeseeable evolution of socioeconomic conditions, which influence the change in greenhouse gas emissions.
 The predictability of the internal variability is associated with information contained in the initial conditions. It is measured by determining for how long the predicted distribution of an ensemble of similar initial states is distinguishable from the climatological distribution [Teng and Branstator, 2010]. While weather and interannual climate predictions attempt to address this source of uncertainty, climate change projections do not. The relative importance of the initial conditions in climate prediction is supposed to vary with the time scale and has been assumed to be a continuous function that decreases with forecast time, becoming negligible after several decades [Hawkins and Sutton, 2009]. It has been shown that there is skill beyond the first forecast year and that the quality of the information about the initial state can improve the climate forecasts in different regions [Doblas-Reyes et al., 2013].
 This work aims at comparing the predictive skill for the near-surface temperature and precipitation of an experiment initialized with observed data and one with no initialization, both experiment accounting for the variable radiative forcing. The reader has to distinguish that this is not a predictability study in the sense of Hawkins and Sutton , but instead is an analysis of the actual predictive skill. The following objectives have been addressed:
 the spatial and time scales at which maximum skill is found, and
 a quantification of the skill improvement provided by the initialization
Section 2 presents the details of the data and the method implemented. Section 3 shows the results for near-surface temperature and precipitation, and section 4 aims at drawing the conclusions and suggests open issues that could lead to further work.
2 Data and Method
 The decadal hindcasts employed are from the perturbed-parameter ensemble of the MetOffice Decadal Prediction System (DePreSys_PP). DePreSys [Smith et al., 2007; Robson, 2010] is based on the global coupled ocean-atmosphere model HadCM3 [Gordon et al., 2000]. The perturbed-parameter ensemble of DePreSys_PP is composed of eight model variants with simultaneous perturbations to 29 atmosphere and sea-ice parameters [Murphy et al., 2004], plus the standard model version. The selection of the variants guarantees an approximately uniform sample of climate sensitivity and a wide range of different parameter settings to sample the model uncertainty.
 The atmospheric resolution is 2.5∘ × 3.75∘ with 19 vertical levels, while the ocean component has a resolution of 1.25∘ × 1.25∘ with 20 vertical levels.
 The decadal hindcasts consist in a set of 10 yearlong retrospective forecasts, starting every November from 1960 until 2005. Here two different experiments of the same forecast system have been compared: NoAssim decadal hindcasts are initialized from nine transient simulations with information about greenhouse gases, tropospheric and stratospheric ozone concentration, and sulfur emissions taken from observations. The volcanic aerosol load is damped with a 1 year e-folding time. The variability in solar radiation is represented by repeating the previous 11 year solar cycle. This gives confidence on the reliability of the operational decadal forecast system, unlike the Fifth Coupled Model Intercomparison Project (CMIP5, [Taylor et al., 2012]) hindcasts that prescribe observed volcanic aerosols and solar irradiance along the predictions. From the same transient runs with identical external forcings, the Assim experiment is initialized by assimilating atmosphere observations of horizontal winds, temperature and surface pressure, and ocean observations of temperature and salinity.
 The study has been carried out considering the following domains: the global (poles excluded, i.e., 60°N–60°S), the Northern hemisphere (NH, poles excluded, i.e., 20°N–60°N), the Southern hemisphere (SH, poles excluded, i.e., 20°S–60°S), and the Tropical band (TRO, 20°S–20°N). The skill measure used here is the correlation computed along the space and time dimensions. For the temperature, the reference data are GHCN [Fan and van den Dool, 2008] for land and ERSST for ocean [Smith et al., 2008], both available until early 2010 at the time of the study. For precipitation, the reference is the CRU data [Brohan et al., 2006], which is available over land until the end of 2006.
 The anomalies at each grid point and for each one of the 120 time steps of the hindcasts have been computed using the per-pair method [García-Serrano and Doblas-Reyes, 2012] in which the computation of the lead-time-dependent climatology accounts only for the years in which both observational and model data are available. Depending on the reference data available and in order to guarantee the same validation sample at all forecast times (i.e., same amount of reference data available at all forecast time), the analysis employs the start dates included in the range 1960–1999 (hindcast starting in November 1999 and finishing in October 2009, when the observation were still available) for temperature and 1960–1996 (hindcast starting in November 1996 and finishing in October 2006, when the observation was still available) for precipitation.
 The forecast-time accumulation is performed by accumulating data for consecutive forecast months, up to accumulating the whole forecast period:
where xt,i,j is the ensemble mean anomaly at forecast time t ∈ [1;120], latitude i ∈ [ min latitude of domain; max latitude of domain ] and longitude j ∈ [ min longitude of domain; max longitude of domain ]. The ensemble mean is the average of the anomalies obtained for each model version.
 In order to illustrate the skill dependence with the spatial scale, between grid point level and global average, the immediate neighbors at each grid point have been averaged along all the possible directions. The case of zero neighbors is the original grid, and the maximum amount of neighbors represents the area average of the domain defined by its latitudinal extension. Successively, the temporal variances and covariances between model and observed anomalies have been computed for each spatial averaging. Finally, the correlation has been computed from the spatial average of variance and covariance values:
where ρg is the correlation for the degree of spatial averaging g, is the temporal covariance between the model (x) and the reference data (y), averaged over the spatially averaged data g; analogously is the temporal variance of the model averaged with the spatial averaging g, and is the temporal variance of the reference data averaged over the spatial averaging g.
 A Student's t-test has been applied, and the p-values 0.01, 0.05, and 0.1 have been calculated and plotted. As the number of degrees of freedom depends on the number of independent data in time and space, a time and space dependency has been arbitrarily chosen for our study. A time dependence between 10 consecutive start dates has been considered, which corresponds to 10 years (the Atlantic multidecadal oscillation, for example, can stay in the same phase for more than 10 years). An area dependency of npt = 5 × 5 grid points that approximately corresponds to an area of 2000 km × 2700 km has been considered. A smaller area dependence has been considered for the precipitation, and an npt = 3× 3 grid points corresponding to approximately 1250 km × 1650 km has been arbitrarily chosen. The degrees of freedom for each spatial averaging g are then calculated as follows:
where indepsd is the number of independent start dates, nlat and nlon are the number of grid points in the original grid, and nlatg and nlong are the number of grid points for the degree of spatial averaging g. For precipitation, this quantity is multiplied by the proportion of land data, as there is precipitation observational data available only over land.
 Figure 1 shows the 2 m temperature correlation between the decadal hindcasts and the reference data as a function of forecast time and spatial averaging. Each row represents a different domain, the global, the Northern hemisphere (NH), the Southern hemisphere (SH), and the Tropical band, respectively. The first column shows the skill of the NoAssim hindcasts, the second column the Assim skill, and the third one is the skill improvement of Assim over NoAssim. The first column of Figure 1 (NoAssim experiment near-surface temperature skill) shows that when the model is not initialized with observations, the skill grows nearly monotonically with spatial and time averaging. This leads to a maximum of skill regardless of the domain considered at the top right corner of the figure, where the spatial averaging and temporal accumulation are the largest (i.e., the whole domain is averaged, and the time accumulation includes all 120 forecast months), which suggests an increasing role of the varying forcing as the time series are smoothed. The smaller domains show that the NoAssim skill is larger in the extratropics, pointing toward a larger relative contribution to the climate signals of the external radiative forcings over the internal variability in the extratropics compared to the tropics, particularly for the 1 to 5 year time scale.
 Assim shows an additional maximum of skill in the global domain at the beginning of the forecast centered at the accumulated month 10 (Figure 1b). In the NH and SH of the initialized experiment (Figures 1e and 1h), there is also an increase in skill with space and time accumulation without the peak that appeared in the global domain at the beginning of the forecast accumulation. The skill of Assim in the Tropical band (Figure 1m) shows in addition to the maximum at large spatial averaging, a peak for short accumulation forecast times. This peak seems to originate from the added value of the initialization associated with the El Niño Southern Oscillator (ENSO) as illustrated in Figure 2, which is explained in detail below. Figure 1c shows that the improvement given by the initialization in the global domain is significant for the accumulation of the first 40 forecast months. Moreover, the maximum improvement due to the initialization appears at intermediate spatial scales, which is an additional motivation to analyze the skill in smaller domains such as the NH, the SH, and the Tropical band.
 Comparing the skill of the NH and SH (Figures 1d and 1g) for the NoAssim experiment, SH shows less skill. Also, for the Assim experiment (Figures 1e and 1h), there are less steep contours in SH. When looking at the initialization improvement (Figures 1f and 1i), significant results are shown for the first 12 accumulated months in the NH and 18 accumulated months in SH. At the larger scale spatial averaging, which corresponds to the whole NH/SH domain (top of Figures 1f and 1i, respectively), the improvement is not significant at the 90% confidence level. Figures S1 and S2 in the supporting information show the results of the study using land-only and ocean-only data, respectively. The maximum skill in the GLO, NH, and SH is lower than that in Figure 1, especially over land, due to the fact that the effective averaged areas are smaller than the one when both ocean and land are considered together. The monotonic increase in skill with spatial and temporal averaging observed in Figure 1 is lost in Figures S1 and S2. When looking at the difference between Assim and NoAssim, the ocean-only figure (Figure S2) shows a statistically significant (confidence level of 90%) impact of the initialization for longer time scales when compared to the mixed ocean and land data. To better identify the different origins of the skill for the Assim and NoAssim, Figure 2 illustrates the temperature anomalies in different domains (respectively, the global domain in the first row and the tropical domain in the second row) and at different forecast time accumulations (first forecast month in the first column, and for the accumulation of all forecast months in the second column). All the grid points of each domain are averaged in all panels. In particular, Figure 2a corresponds to the temperature anomalies at the first forecast month for the 46 start dates used in this study. The anomaly correlation coefficients generated by those anomalies correspond to the top left points in Figures 1a (NoAssim) and 1b (Assim). Similarly, Figure 2b illustrates the temperature anomalies of the averaged forecast period, and the corresponding anomaly correlation coefficient is represented by the top right corner points of Figures 1a and 1b. Analogously, the top left and top right anomaly correlation coefficients in Figures 1l and 1m are generated, respectively, by the anomalies illustrated in Figures 2c and 2d. Figure 2a shows that NoAssim ensemble has more spread than Assim (this is also shown in Figures 2c and 2d for the tropical domain discussed below). When integrating over time, the role of the radiative forcings gets dominant, and both Assim and NoAssim have a correlation with the reference data higher than 0.9 (Figure 2b).
 Compared to other domains, the Tropical band of NoAssim (Figure 1l) has the lowest skill. The skill figure of Assim in the Tropical band (Figure 1m) shows, in addition to the maximum at global average, a peak at the beginning of the forecast accumulation. The maximum improvement of the initialization is given during the accumulation of the first 4 forecast months (Figure 1n), while significant improvements are shown until 40 forecast months accumulation, and improvements last up to the first 50 forecast months accumulation. Such improvement at the beginning of the forecast is also shown in Figure 2. Figure 2c shows the temperature anomalies of the first forecast month when averaging over the whole tropical band. Assim has a correlation of 0.94, while NoAssim has a correlation of 0.38. These correlations are significantly different with at least 99% confidence level. NoAssim reflects the continuous warming without showing any variability consistent in time with the reference data. This is mainly due to the fact that NoAssim does not have any information about the phase and amplitude of contemporaneous events, while Assim does. The correlation of the observed Niño 3.4 SST index with the tropically averaged reference data is of 0.69, which also suggests that the variability in the Tropical band is dominated by the variability of ENSO. Figure S3a of the supporting information shows the Niño 3.4 SST index anomalies for the first forecast month. Assim has a correlation with ERSST of 0.99, while NoAssim has a negative correlation.
 Similar to Figure 2b for the global domain, in Figure 2d for the Tropics, when the forecast time is accumulated over the 120 forecast months, both experiments have a correlation with the observations higher than 0.9 due to the large role of the external forcings. The results suggest that most of the skill improvement with initialization found in the global domain is associated with the tropical region. Figure S3b of the supporting information shows that with the accumulation of the whole forecast period, the Niño 3.4 SST index skill decreases with respect to that shown in Figure S3a. As a result, Assim and NoAssim have similar skill of around 0.55. Figure S3c shows how the Niño 3.4 correlation difference evolves as the forecast-time accumulation increases, with the situations in Figures S3a and S3b being the two extremes. The difference in correlation decreases with forecast-time accumulation. During the first forecast year, the correlation difference is greater than 1, which means that the initialization actually corrects the sign of the NoAssim correlation. Moreover, the correlation is significantly different at 90% level for accumulations of up to 40 forecast months, which explains the results found for the TRO in Figure 1.
 The results described above are consistent across the different model version used. Figure S4 shows the skill in the global domain for each individual version of the forecast system. When using the linear trend of the global-mean temperature as a proxy for the climate sensitivity, it was found that when the trend is stronger, the NoAssim experiment (first column Figure S4) has a larger skill at all spatial and temporal scales. Moreover, the Assim panels (second column figure S4) show that the impact of initialization is stronger when the trend is weaker. The model versions in Figure S4 are ranked following the climate sensitivity estimates shown in Figure S5, from the model version with higher climate sensitivity to the one with the lowest.
 The NoAssim precipitation (first column Figure 3) has negative correlation with the CRU data in the global domain (Figure 3a) and the Tropical band (Figure 3g). Some positive skill is found in the NH (Figure 3d) at large-scale averaging and full forecast-time accumulation. The correlation with CRU of the initialized experiment displays significantly positive skill (confidence level of 90%) in every domain (second column Figure 3) for an accumulation of up to 20 forecast months. The maximum skill is found in the Tropics (Figure 3h) for an accumulation of 12 forecast months. Improvement with the initialization is found in the global domain (Figure 3c) for all forecast-time accumulations and spatial scales averaging, except for the averaging of the largest spatial scales. These improvements are the strongest over the TRO region (Figure 3i) and are mainly due to the correction of the NoAssim negative skill by the initialization.
 In this work, the improvements associated with the initialization of a decadal forecast system have been quantified. Moreover, it has been documented how the added value of the initial condition information varies with the temporal and spatial scales.
 For near-surface temperature, it has been found that when increasing the spatial scales and temporal accumulation, the external forcing influence becomes more important. DePreSys_PP correctly reproduces the surface temperature response to the variation of external forcings. This leads to a maximum of skill at time scales of 10 accumulated years, which is the maximum time scale considered in this study, and for regional to large scales. This is not the case for land precipitation, for which the sign of the correlation with the observations is negative.
 By introducing information of the state of the climate system through the initialization, a new peak of surface temperature skill appears from the beginning of the forecast to the first 40 forecast months in the global domain and the tropical band. This seems to be due to the correct prediction of ENSO, which is usually considered limited to 1 year. Globally, the skill improvements due to the initialization are mainly coming from the Tropics.
 The NH is more skillful than the SH for near-surface temperature. The improvements brought by the initialization are statistically significant with a confidence level of 90%, respectively, in the NH for the accumulation of the first 12 forecast months and in the SH for the first 20 forecast months and from the grid point scale to the regional spatial scale. The reason for not getting statistically significant results when averaging over the whole hemisphere might be due to the small number of independent data available. A longer reforecast period would be necessary to get more robust results.
 The skill results for near-surface temperature are consistent across the different model versions. When using the linear trend of the global-mean temperature as a proxy for the climate sensitivity, it was found that in the NoAssim experiment, the stronger the trend, the larger the skill, at all spatial and temporal scales. The Assim experiment shows that the impact of initialization is stronger when the trend is weaker.
 NoAssim has no precipitation skill in the global domain and the Tropics at almost any spatial and time scale. The initialization corrects the negative sign of the NoAssim correlation and has a beneficial impact for all time scales and spatial averaging in both domains. Much is still needed to improve multiannual precipitation forecasts, especially considering that precipitation is a key variable with large socioeconomic consequences.
 This work was supported by the EU-funded QWeCI (FP7-ENV-2009-1- 243964), SPECS (FP7-ENV-2012-308378), NACLIM (FP7-ENV-2012-308299), the MICINN-funded RUCSS (CGL2010-20657) projects and the Catalan Government. The authors thankfully acknowledge the computer resources, technical expertise, and assistance provided by the Red Española de Supercomputaciòn (RES).
 The Editor thanks one anonymous reviewer for his/her assistance in evaluating this paper.